github_bot_3d_papers icon indicating copy to clipboard operation
github_bot_3d_papers copied to clipboard

Call Arxiv API and automatically update paper list

Daily Updates on 3D-Related Papers

This repository automatically fetches new or updated arXiv papers in the [cs.CV] category every day, checks if they are relevant to "3D reconstruction" or "3D generation" via ChatGPT, and lists them below.

How It Works

  1. A GitHub Actions workflow runs daily at 09:00 UTC.
  2. It uses the script fetch_cv_3d_papers.py to:
    • Retrieve the latest arXiv papers in cs.CV.
    • Use ChatGPT to filter out those related to 3D reconstruction/generation.
    • Update this README.md with the new findings.
    • Send an email via 163 Mail if any relevant papers are found.

Paper List

Arxiv 2025-04-15

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.08901 HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields
[{'name': 'Asterios Reppas, Grigorios-Aris Cheimariotis, Panos K. Papadopoulos, Panagiotis Frasiolas, Dimitrios Zarpalas'}]
Neural Rendering 神经渲染 v2
Camera relocalization 相机重定位
Neural Radiance Fields 神经辐射场
Autonomous driving 自动驾驶
Input: Camera captures 相机捕捉
Step1: Initial pose estimation using CNN 装载初步姿态估计通过CNN
Step2: Data augmentation with NeRFs 使用NeRF数据增强
Step3: Refinement using Monte Carlo particle filter 使用蒙特卡洛粒子过滤器进行优化
Output: High accuracy camera localization 高精度相机定位
9.5 [9.5] 2504.09048 BlockGaussian: Efficient Large-Scale Scene NovelView Synthesis via Adaptive Block-Based Gaussian Splatting
[{'name': 'Yongchang Wu, Zipeng Qi, Zhenwei Shi, Zhengxia Zou'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
novel view synthesis
Input: Multi-view images 多视角图像
Step1: Content-aware scene partitioning 内容感知场景分割
Step2: Individual block optimization 独立块优化
Step3: Block merging and fusion 块合并与融合
Output: High-quality novel view synthesis 高质量的新视图合成
9.5 [9.5] 2504.09062 You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting
[{'name': 'Zhijie Shen, Chunyu Lin, Shujuan Huang, Lang Nie, Kang Liao, Yao Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian splatting
panoramic images
Input: Panoramic images 全景图像
Step1: Introduce Transition Plane 引入过渡平面
Step2: Optimize 3D Gaussians in cubemap faces 在立方体面中优化3D高斯
Step3: Stitch cube faces into equirectangular panorama 将立方体面拼接成正方形全景
Output: Enhanced 3D models via Gaussian splatting 通过高斯点云改进的三维模型
9.5 [9.5] 2504.09129 A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds
[{'name': 'Jizong Peng, Tze Ho Elden Tse, Kai Xu, Wenchao Gao, Angela Yao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
camera pose estimation
Input: Coarsely-posed images and noisy Lidar point clouds 粗略姿态图像和噪声激光雷达点云
Step1: Decompose camera pose into optimizations 分解相机姿态至优化步骤
Step2: Apply constrained optimization with geometric constraints 应用带有几何约束的约束优化
Step3: Perform simultaneous camera pose estimation and 3D reconstruction 进行同时的相机姿态估计和3D重建
Output: High-quality 3D reconstructions 高质量3D重建
9.5 [9.5] 2504.09149 MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation
[{'name': 'Changhao Li, Yu Xin, Xiaowei Zhou, Ariel Shamir, Hao Zhang, Ligang Liu, Ruizhen Hu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D shape representation
surface reconstruction
generative model
Input: Point clouds 点云
Step1: MASH parameterization MASH 参数化
Step2: Differentiable optimization 可微优化
Step3: Surface approximation 表面近似
Output: MASH representation MASH 表示
9.5 [9.5] 2504.09328 Text To 3D Object Generation For Scalable Room Assembly
[{'name': 'Sonia Laguna, Alberto Garcia-Garcia, Marie-Julie Rakotosaona, Stylianos Moschoglou, Leonhard Helminger, Sergio Orts-Escolano'}]
3D Generation 三维生成 v2
3D generation
synthetic data
Neural Radiance Fields
Input: Text prompts 文本提示
Step1: Prompt engineering 提示工程
Step2: Synthetic data generation 合成数据生成
Step3: Integration into room layouts 房间布局集成
Output: Customizable 3D indoor scenes 自定义的三维室内场景
9.5 [9.5] 2504.09491 DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering
[{'name': 'Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo'}]
3D Reconstruction 三维重建 v2
3D Gaussian Splatting
novel view synthesis
overfitting
dropout technique
Input: Sparse-view images 短视图图像
Step1: Analyze performance degradation 分析性能退化
Step2: Implement dropout technique 实现dropout技术
Step3: Integrate edge-guided strategy 集成边缘引导策略
Output: Improved novel view synthesis outputs 改进的新视图合成输出
9.5 [9.5] 2504.09518 3D CoCa: Contrastive Learners are 3D Captioners
[{'name': 'Ting Huang, Zeyu Zhang, Yemin Wang, Hao Tang'}]
3D Captioning and Vision-Language Learning 3D 描述和视觉-语言学习 v2
3D captioning
contrastive learning
vision-language models
Input: 3D scenes with point clouds 3D场景与点云
Step1: Contrastive pretraining using visual and textual data 对图像和文本数据进行对比预训练
Step2: Multimodal decoding for caption generation 多模态解码以生成描述性标题
Step3: Joint optimization of spatial reasoning and captioning tasks 对空间推理和描述任务进行联合优化
Output: Enhanced descriptive captions for 3D scenes 改进的3D场景描述性标题
9.5 [9.5] 2504.09535 FastRSR: Efficient and Accurate Road Surface Reconstruction from Bird's Eye View
[{'name': 'Yuting Zhao, Yuheng Ji, Xiaoshuai Hao, Shuxiao Li'}]
3D Reconstruction 三维重建 v2
Road Surface Reconstruction
Autonomous Driving
Depth-Aware Projection
Input: Bird's Eye View images 鸟瞰图像
Step1: Depth-aware 3D-to-2D Projection (DAP) module 深度感知3D到2D投影模块
Step2: Spatial Attention Enhancement (SAE) module 空间注意力增强模块
Step3: Confidence Attention Generation (CAG) module 信心注意力生成模块
Output: Accurate road surface reconstruction 精确的道路表面重建
9.5 [9.5] 2504.09588 TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting
[{'name': 'Zhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li, Liqiang Nie'}]
3D Reconstruction 三维重建 v2
3D Reconstruction
Gaussian Splatting
Semantic Fusion
Input: Sparse multi-view images 稀疏多视角图像
Step1: Utilize Diffusion Prior Depth Estimator for depth information 使用扩散优先深度估计器获取深度信息
Step2: Employ Semantic Aware Segmentation Network for semantic information 使用语义感知分割网络获取语义信息
Step3: Refine cross-view features with Multi-View Interaction Network 使用多视角交互网络改善视图间特征
Step4: Integrate representations through Text-Guided Semantic Fusion Module 通过文本引导语义融合模块整合表示
Output: High-fidelity 3D reconstructions 高保真3D重建
9.5 [9.5] 2504.09878 MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling
[{'name': 'Yunpeng Tan, Junlin Hao, Jiangkai Wu, Liming Liu, Qingyang Li, Xinggong Zhang'}]
Neural Rendering 神经渲染 v2
Neural Radiance Field
3D reconstruction
ray-sampling
Input: Training images 训练图像
Step1: Block partitioning 块划分
Step2: Initialization of block-tree 初始化块树
Step3: Dynamic optimization 动态优化
Output: Accelerated ray-sampling for NeRF 加速的NeRF光线采样
9.5 [9.5] 2504.10001 GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting
[{'name': 'Junlin Hao, Peiheng Wang, Haoyang Wang, Xinggong Zhang, Zongming Guo'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
video diffusion
Gaussian Splatting
Input: Video sequences 视频序列
Step1: Geometry-aware initialization 几何感知初始化
Step2: Inconsistency-Aware Gaussian Splatting 处理不一致性高斯点云
Step3: Progressive video inpainting 渐进式视频修补
Output: Enhanced 3D scenes 改进的三维场景
9.5 [9.5] 2504.10012 EBAD-Gaussian: Event-driven Bundle Adjusted Deblur Gaussian Splatting
[{'name': 'Yufei Deng, Yuanjian Wang, Rong Xiao, Chenwei Tang, Jizhe Zhou, Jiahao Fan, Deng Xiong, Jiancheng Lv, Huajin Tang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
event camera
motion blur removal
multi-modal fusion
Input: Event streams and blurred images 事件流和模糊图像
Step1: Construct a blur loss function 建立模糊损失函数
Step2: Optimize Gaussian parameters and camera trajectories 优化高斯参数和相机轨迹
Step3: Evaluate reconstruction quality 评估重建质量
Output: High-fidelity 3D reconstruction 高保真3D重建
9.5 [9.5] 2504.10035 TT3D: Table Tennis 3D Reconstruction
[{'name': 'Thomas Gossard, Andreas Ziegler, Andreas Zell'}]
3D Reconstruction 三维重建 v2
3D reconstruction
table tennis
motion analysis
sports analytics
Input: Online table tennis match recordings 在线乒乓球比赛录音
Step1: Camera calibration using segmentation mask using known geometry known using segmentation mask 通过已知几何对象的分割掩模进行相机校准
Step2: Detect 2D ball positions using a deep learning-based ball detector 使用基于深度学习的球检测器检测2D球位置
Step3: Reconstruct 3D ball trajectories from camera calibrated images from camera using physics-based model using camera校准图像通过基于物理模型重建3D球轨迹
Output: Full 3D reconstruction of table tennis rallies 乒乓球比赛的完整3D重建
9.5 [9.5] 2504.10117 AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
[{'name': 'Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, Andreas Zell'}]
3D Reconstruction and Modeling 三维重建 v2
3D occupancy prediction 3D占用预测
autonomous driving 自动驾驶
vision-language models 视觉语言模型
Input: Sensor inputs (images) 传感器输入(图像)
Step1: Encode images into 3D and text embeddings 将图像编码为3D和文本嵌入
Step2: Similarity-based grounding training with 3D pseudo-labels 基于相似性的3D伪标签训练
Step3: Map 3D embeddings to align with VLM-derived image embeddings 将3D嵌入映射以与VLM生成的图像嵌入对齐
Output: Improved voxelized 3D occupancy predictions 改进的体素化3D占用预测
9.5 [9.5] 2504.10331 LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis
[{'name': 'Hao Sun, Fenggen Yu, Huiyao Xu, Tao Zhang, Changqing Zou'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction 三维重建
novel view synthesis 新视图合成
Input: Low-light sRGB images 低光照sRGB图像
Step1: Low-Light Gaussian Initialization Module (LLGIM) 低光照高斯初始化模块
Step2: Dual-branch Gaussian decomposition model 双分支高斯分解模型
Step3: Unsupervised optimization strategy 无监督优化策略
Output: High-quality 3D point clouds 高质量三维点云
9.5 [9.5] 2504.10466 Art3D: Training-Free 3D Generation from Flat-Colored Illustration
[{'name': 'Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar'}]
3D Generation 三维生成 v2
3D generation
flat-colored images
image-to-3D models
Input: Flat-colored 2D illustrations 平面着色的2D插图
Step1: Generate multiple 3D proxy candidates 生成多个3D代理图像
Step2: Select the best candidate for 3D generation 选择最佳候选进行3D生成
Step3: Texture the generated mesh based on the original input 根据原始输入为生成的网格添加纹理
Output: Realistic 3D models 生成的真实感3D模型
9.2 [9.2] 2504.10106 SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding
[{'name': "Marc Guti\'errez-P\'erez, Antonio Agudo"}]
3D Reconstruction 三维重建 v2
3D reconstruction
multi-view synchronization
camera calibration
soccer analysis
Input: Multi-view synchronized images from soccer broadcasts 多视角同步图像
Step1: Camera calibration using field-line annotations 使用场地线标注进行相机校准
Step2: Triangulation of 2D annotations to generate 3D positions 通过三角测量生成3D位置
Step3: Optimization of bounding boxes based on multi-view data 基于多视角数据优化边界框
Output: 3D ball localization annotations 3D球定位标注
9.0 [9.0] 2504.09086 RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
[{'name': 'Yunfei Long, Abhinav Kumar, Xiaoming Liu, Daniel Morris'}]
3D Object Detection 目标检测 v2
3D object detection 3D目标检测
camera-radar fusion 相机-雷达融合
autonomous vehicles 自动驾驶车辆
Input: Monocular detections 单目检测
Step1: Predict radar hit distributions 预测雷达命中分布
Step2: Match radar points with predicted distribution 匹配雷达点与预测分布
Step3: Refine detection scores using fusion refinement 通过融合优化检测得分
Output: Enhanced 3D object detection improved 3D目标检测
9.0 [9.0] 2504.09160 SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow
[{'name': 'Qingyuan Wang, Rui Song, Jiaojiao Li, Kerui Cheng, David Ferstl, Yinlin Hu'}]
3D Object Pose Estimation 物体姿态估计 v2
6D object pose estimation
RGBD images
3D shape constraints
Input: RGBD frames RGBD帧
Step1: Introduce geometry constraints 介绍几何约束
Step2: Combine rigid-motion and 3D shape prior 结合刚性运动和3D形状先验
Step3: Iterative optimization 迭代优化
Output: Accurate 6D object poses 精确的6D物体姿态
8.5 [8.5] 2504.09097 BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
[{'name': 'Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Junuk Cha, Soohyun Hwang, Hyein Hwang, Seungryul Baek'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
hand-object interaction
Gaussian Splatting
Input: Monocular RGB video 单目 RGB 视频
Step1: Separate optimization for hand and object Gaussians 单独优化手部和物体高斯
Step2: Joint optimization to consider interactions 共同优化以考虑交互
Output: 3D Gaussians of hands and an unknown object 输出: 手部和未知物体的 3D 高斯
8.5 [8.5] 2504.09498 EasyREG: Easy Depth-Based Markerless Registration and Tracking using Augmented Reality Device for Surgical Guidance
[{'name': 'Yue Yang, Christoph Leuze, Brian Hargreaves, Bruce Daniel, Fred Baik'}]
3D Reconstruction and Modeling 三维重建 v2
markerless registration
surgical guidance
depth sensing
augmented reality
Input: Depth data from AR device 增强现实设备的深度数据
Step1: Robust point cloud registration 稳健点云注册
Step2: Human-in-the-loop sensor error correction 人为干预的传感器误差修正
Step3: Global alignment with curvature-aware feature sampling 全局对齐与曲率感知特征采样
Step4: Local ICP refinement 局部迭代最近点优化
Output: Accurate anatomical localization and tracking 精确的解剖位置与跟踪
8.5 [8.5] 2504.09506 Pillar-Voxel Fusion Network for 3D Object Detection in Airborne Hyperspectral Point Clouds
[{'name': 'Yanze Jiang, Yanfeng Gu, Xian Li'}]
3D Object Detection 3D目标检测 v2
3D object detection 3D目标检测
hyperspectral point clouds 超光谱点云
feature fusion 特征融合
Input: Hyperspectral point clouds (HPCs) 超光谱点云
Step1: Develop pillar-voxel dual-branch encoder 发展柱-体素双分支编码器
Step2: Multi-level feature fusion mechanism for information interaction 多级特征融合机制以增强信息交互
Step3: Validate performance on airborne HPC datasets 在空中HPC数据集上验证性能
Output: Enhanced 3D object detection performance 改进的3D目标检测性能
8.5 [8.5] 2504.09540 EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler
[{'name': 'Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D occupancy prediction
3D Gaussian Splatting
Input: Monocular RGB images 单目RGB图像
Step1: Geometry-guided Refinement Module (GRM) geometry-guided refinement 模块
Step2: Semantic-aware Uncertainty Sampler (SUS) 语义感知不确定性采样器
Step3: Gaussian updates 高斯更新
Output: Improved 3D occupancy predictions 改进的3D占据预测
8.5 [8.5] 2504.09623 Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
[{'name': 'Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju, Sougata Sen, Sanjay E. Sarma, Archan Misra'}]
3D Reconstruction and Modeling 三维重建 v2
3D embodied reference understanding 3D体现引用理解
data augmentation 数据增强
language grounding 语言基础
Input: Language description and pointing gesture 语言描述与指向手势
Step1: Data augmentation to insert human avatars into 3D scenes 数据增强以将人类化身插入3D场景中
Step2: Model development for 3D-ERU incorporating human localization 3D-ERU模型开发,结合人类定位
Step3: Dataset curation to create ImputeRefer dataset 数据集整理以创建ImputeRefer数据集
Output: Enhanced model for 3D embodied reference understanding 改进的3D体现引用理解模型
8.5 [8.5] 2504.09671 LightHeadEd: Relightable & Editable Head Avatars from a Smartphone
[{'name': 'Pranav Manu, Astitva Srivastava, Amit Raj, Varun Jampani, Avinash Sharma, P. J. Narayanan'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
head avatars
smartphone
polarization
real-time rendering
Input: Monocular video streams 单目视频序列
Step1: Capture polarized video streams 捕获极化视频流
Step2: Decompose surface properties 分解表面属性
Step3: Learn head avatar representation 学习头部头像表示
Output: Relightable 3D head avatars 可重光源的三维头部头像
8.5 [8.5] 2504.09789 EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise
[{'name': 'Chao Liu, Arash Vahdat'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
3D consistency
motion alignment
Input: Video frames and 3D meshes 视频帧与三维网格
Step1: Generate temporally consistent noise using input video 生成时间一致的噪声以使用输入视频
Step2: Attach noise as textures on 3D meshes 将噪声附加为三维网格上的纹理
Step3: Train video diffusion model with the noise 训练视频扩散模型,使用噪声
Output: Coherent video frames with 3D consistency 输出: 具有三维一致性的连贯视频帧
8.5 [8.5] 2504.09953 Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations
[{'name': 'Katja Ludwig, Yuliia Oksymets, Robin Sch\"on, Daniel Kienzle, Rainer Lienhart'}]
3D Reconstruction and Modeling 三维重建 v2
3D human pose estimation
joint rotations
sports analytics
2D to 3D uplifting
Input: 2D keypoints from video frames 视频帧中的2D关键点
Step1: Model design for 2D-to-3D conversion 设计用于2D到3D转换的模型
Step2: Rotation representation selection 选择旋转表示方式
Step3: Evaluation of joint localization and rotation accuracy 评估关节定位和旋转精度
Output: Accurate 3D human poses including joint rotations 输出:包括关节旋转的精确3D人体姿态
8.5 [8.5] 2504.10024 Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes
[{'name': 'Mengkun She, Felix Seegr\"aber, David Nakath, Patricia Sch\"ontag, Kevin K\"oser'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D Reconstruction
Underwater Imaging
Input: Images of underwater scenes 水下场景图像
Step 1: Model illumination fields 建立照明场
Step 2: Integrate volumetric representation 整合体积表示
Step 3: Optimize the pipeline 优化整个流程
Output: Photorealistic scene representation 真实感场景表示
8.5 [8.5] 2504.10123 M2S-RoAD: Multi-Modal Semantic Segmentation for Road Damage Using Camera and LiDAR Data
[{'name': 'Tzu-Yun Tseng, Hongyu Lyu, Josephine Li, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}]
Autonomous Systems and Robotics 自动驾驶 v2
Multi-modal dataset
Semantic segmentation
Road damage detection
LiDAR
Camera
Input: Camera and LiDAR data 摄像头和激光雷达数据
Step1: Data collection 数据收集
Step2: Semantic segmentation algorithms 语义分割算法
Step3: Dataset generation 数据集生成
Output: M2S-RoAD dataset M2S-RoAD数据集
8.5 [8.5] 2504.10275 LMFormer: Lane based Motion Prediction Transformer
[{'name': 'Harsh Yadav, Maximilian Schaefer, Kun Zhao, Tobias Meisen'}]
Autonomous Systems and Robotics 自动驾驶 v2
motion prediction
autonomous driving
lane-aware transformer
Input: Dynamic context and static context for trajectory prediction 动态和静态上下文用于轨迹预测
Step1: Lane-aware attention mechanism 车道感知注意机制
Step2: Graph Neural Network-based map encoding 图神经网络基础的地图编码
Step3: Iterative refinement strategies with transformer layers 通过变压器层迭代精化策略
Output: Improved trajectory predictions 改进的轨迹预测
8.5 [8.5] 2504.10316 ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting
[{'name': 'Huiqi Wu, Jianbo Mei, Yingjie Huang, Yining Xu, Jingjiao You, Yilong Liu, Li Yao'}]
3D Generation 三维生成 v2
3D generation
text-to-3D
multi-view integration
Input: Simple text inputs and additional conditions 简单文本输入和附加条件
Step1: Self-optimization process to refine text prompts 自我优化过程以改善文本提示
Step2: Generate 3D content based on refined prompts 根据改进的提示生成3D内容
Step3: Integrate multi-view information to enhance quality 整合多视角信息以提升质量
Output: High-quality, controllable 3D content 高质量、可控的3D内容
8.5 [8.5] 2504.10350 Benchmarking 3D Human Pose Estimation Models Under Occlusions
[{'name': 'Filipa Lino, Carlos Santiago, Manuel Marques'}]
3D Reconstruction and Modeling 三维重建 v2
3D Human Pose Estimation
occlusions
dataset synthesis
Input: Multi-camera setups 多摄像头设置
Step1: Dataset synthesis 数据集合成
Step2: Model testing 模型测试
Step3: Performance evaluation 性能评估
Output: Insights on model robustness 模型稳健性见解
8.5 [8.5] 2504.10433 MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model
[{'name': 'Jian Liu, Wei Sun, Hui Yang, Jin Zheng, Zichen Geng, Hossein Rahmani, Ajmal Mian'}]
3D Reconstruction and Modeling 三维重建 v2
object pose estimation
monocular
diffusion model
3D reconstruction
autonomous systems
Input: Monocular image 单目图像
Step1: Coarse depth estimation 粗略深度估计
Step2: Point cloud generation 点云生成
Step3: Feature fusion 特征融合
Step4: Pose recovery 位置恢复
Output: 9D object pose estimation 9D物体姿态估计
8.5 [8.5] 2504.10485 Decoupled Diffusion Sparks Adaptive Scene Generation
[{'name': 'Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, Hongyang Li'}]
Image and Video Generation 图像生成 v2
scene generation
autonomous driving
data collection
Input: Scene generation using decoupled noise states 场景生成使用解耦噪声状态
Step1: Implement a noise-masking training strategy 实施噪声掩蔽训练策略
Step2: Simulate complex driving scenarios 模拟复杂驾驶场景
Step3: Integrate goal conditioning with environmental updates 将目标条件与环境更新结合
Output: Realistic and adaptive scene generation 真实的自适应场景生成
8.5 [8.5] 2504.10486 DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting
[{'name': 'Zeren Jiang, Shaofei Wang, Siyu Tang'}]
Neural Rendering 神经渲染 v2
3D Gaussian splatting
real-time rendering
animatable avatars
Input: Monocular videos 单目视频
Step1: Knowledge distillation 知识蒸馏
Step2: Geometry and appearance estimation 几何和外观估计
Step3: Shadow computation 阴影计算
Output: Real-time relightable avatars 实时可重光照的虚拟人像
7.5 [7.5] 2504.10049 Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure
[{'name': "Th\'eo Gigant, Camille Guinaudeau, Fr\'ed\'eric Dufaux"}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
multimodal presentations
automatic summarization
Input: Multimodal presentations multimodal演示
Step1: Benchmarking VLMs基准测试VLMs
Step2: Analysis of input representations输入表示法分析
Step3: Cost and performance evaluation成本和性能评估
Output: Summarized presentations总结的演示
6.5 [6.5] 2504.09426 BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
[{'name': 'Shengao Wang, Arjun Chandra, Aoming Liu, Venkatesh Saligrama, Boqing Gong'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
data-efficient pretraining
infant learning
Input: Infant-inspired datasets 婴儿启发的数据集
Step1: Design evaluation tasks 设计评估任务
Step2: Data distillation for synthetic augmentation 数据蒸馏以进行合成增强
Step3: Model training and evaluation 模型训练与评估
Output: Improved VLM performance 改进的 VLM 性能
6.5 [6.5] 2504.09724 A Survey on Efficient Vision-Language Models
[{'name': 'Gaurav Shinde, Anuradha Ravi, Emon Dey, Shadman Sakib, Milind Rampure, Nirmalya Roy'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
Vision-language models
edge devices
Input: Vision-language models 视觉语言模型
Step1: Review optimization techniques 评估优化技术
Step2: Explore compact architectures 探索紧凑架构
Step3: Analyze performance-memory trade-offs 分析性能和内存的权衡
Output: Efficient VLMs for edge devices 边缘设备的高效视觉语言模型

Arxiv 2025-04-14

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.08100 ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting
[{'name': 'Junbang Liu, Enpei Huang, Dongxing Mao, Hui Zhang, Xinyuan Song, Yongxin Ni'}]
3D Generation 三维生成 v2
3D generation
contrastive learning
Gaussian splatting
Input: Single-view images 单视角图像
Step1: Image upscaling using super-resolution [超分辨率图像处理]
Step2: Generate novel perspectives via a 2D diffusion model [通过2D扩散模型生成新视角]
Step3: Incorporate contrastive learning with Gaussian splatting [将对比学习与高斯点云结合]
Step4: Optimize the model using Quantity-Aware Triplet Loss [使用量化感知三元损失优化模型]
Output: Enhanced and consistent 3D models [改进的一致性3D模型]
9.5 [9.5] 2504.08252 Stereophotoclinometry Revisited
[{'name': 'Travis Driver, Andrew Vaughan, Yang Cheng, Adnan Ansar, John Christian, Panagiotis Tsiotras'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
structure from motion
photoclinometry
Input: In-situ imagery 现场图像
Step1: Keypoint detection and matching 关键点检测与匹配
Step2: Integration of photoclinometry in SfM 光照学与运动结构估计的整合
Step3: Simultaneous optimization of parameters 同时优化参数
Output: Enhanced surface models 改进的表面模型
9.5 [9.5] 2504.08361 SN-LiDAR: Semantic Neural Fields for Novel Space-time View LiDAR Synthesis
[{'name': 'Yi Chen, Tianchen Deng, Wentao Zhao, Xiaoning Wang, Wenqian Xi, Weidong Chen, Jingchuan Wang'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR synthesis
3D reconstruction
semantic segmentation
autonomous driving
Input: LiDAR point clouds
Step1: Coarse-to-fine planar-grid feature extraction
Step2: Semantic segmentation using CNN
Step3: Joint geometric reconstruction and synthesis
Output: Realistic LiDAR scans with semantic labels
9.5 [9.5] 2504.08410 PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction
[{'name': 'Mingzhi Pei, Xu Cao, Xiangyi Wang, Heng Guo, Zhanyu Ma'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
surface normal estimation
autonomous driving
Input: Multi-view surface normal maps 多视角表面法线图
Step1: Utilize geometric constraints from surface normals 使用表面法线的几何约束
Step2: Joint optimization of surface shape and camera poses 表面形状和相机姿态的联合优化
Step3: Evaluation of surface geometry and camera poses 评估表面几何和相机姿态
Output: High-fidelity surface reconstruction 高保真度表面重建
9.5 [9.5] 2504.08419 GeoTexBuild: 3D Building Model Generation from Map Footprints
[{'name': 'Ruizhe Wang, Junyan Yang, Qiao Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D building generation
GeoTexBuild
ControlNet
Text2Mesh
Input: Map footprints 地图轮廓
Step1: Height map generation 高度图生成
Step2: Geometry reconstruction 几何重建
Step3: Appearance stylization 外观风格化
Output: 3D building models 3D建筑模型
9.5 [9.5] 2504.08675 X2BR: High-Fidelity 3D Bone Reconstruction from a Planar X-Ray Image with Hybrid Neural Implicit Methods
[{'name': 'Gokce Guven, H. Fatih Ugurdag, Hasan F. Ates'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
bone modeling
neural implicit methods
Input: Single planar X-ray image 单个平面X射线图像
Step1: Feature extraction using ConvNeXt 特征提取
Step2: Continuous volumetric reconstruction 连续体积重建
Step3: Template-guided non-rigid registration 模板引导的非刚性配准
Output: Anatomically consistent 3D bone volume 解剖学上一致的3D骨骼体积
9.0 [9.0] 2504.08280 PNE-SGAN: Probabilistic NDT-Enhanced Semantic Graph Attention Network for LiDAR Loop Closure Detection
[{'name': 'Xiong Li, Shulei Liu, Xingning Chen, Yisong Wu, Dong Zhu'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
LiDAR loop closure detection
semantic graph
SLAM
Input: LiDAR point cloud data LiDAR点云数据
Step1: Graph construction 生成图结构
Step2: Feature enhancement using NDT 特征增强
Step3: Graph Attention Network processing 使用图注意力网络进行处理
Step4: Probabilistic filtering for loop closure detection 概率滤波进行闭环检测
Output: Enhanced loop closure detection results 提升的闭环检测结果
9.0 [9.0] 2504.08412 Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-Training
[{'name': 'Chao Qi, Jianqin Yin, Meng Chen, Yingchun Niu, Yuan Sun'}]
3D Reconstruction and Modeling 三维重建 v2
3D point clouds 3D点云
class-incremental learning 类别增量学习
geometry knowledge 几何知识
Input: 3D point clouds 3D点云
Step1: Create a basic shape dataset 创建基本形状数据集
Step2: Pre-train model on geometric knowledge 在几何知识上预训练模型
Step3: Incremental learning framework implementation 增量学习框架实现
Output: Enhanced class-incremental learning capabilities 增强的类别增量学习能力
8.5 [8.5] 2504.08125 Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
[{'name': 'Shalini Maiti, Lourdes Agapito, Filippos Kokkinos'}]
3D Generation 三维生成 v2
text-to-3D generation
evaluation metrics
vision large language models
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Feature extraction through vLLMs 通过视觉大型语言模型提取特征
Step3: Quality assessment for 3D objects 3D对象的质量评估
Output: Evaluation scores for generated 3D objects 生成3D对象的评估分数
8.5 [8.5] 2504.08154 Investigating Vision-Language Model for Point Cloud-based Vehicle Classification
[{'name': 'Yiqiao Li, Jie Wei, Camille Kamga'}]
Point Cloud Processing 点云处理 v2
vision-language models
point cloud processing
autonomous driving
Input: Point cloud data and LiDAR captures LiDAR数据和点云输入
Step1: Preprocessing pipeline to adapt point cloud for VLM preprocessing管道调整点云以适应VLM
Step2: Point cloud registration and classification points cloud注册与分类
Step3: Model evaluation and experimentation 模型评估与实验
Output: Efficient classification results 高效分类结果
8.5 [8.5] 2504.08307 DSM: Building A Diverse Semantic Map for 3D Visual Grounding
[{'name': 'Qinghongbing Xie, Zijian Liang, Long Zeng'}]
3D Reconstruction and Modeling 3D重建与建模 v2
3D Visual Grounding 3D视觉定位
Semantic Map 语义地图
Vision-Language Models 视觉语言模型
Input: Multi-view images and VLM data 多视角图像和VLM数据
Step1: Construct Diverse Semantic Map (DSM) 构建多样语义地图
Step2: Enhance scene understanding based on DSM 基于DSM增强场景理解
Step3: Implement DSM-Grounding for 3D Visual Grounding 实现DSM-Grounding进行3D视觉定位
Output: Improved performance in robotic tasks 改进机器人任务的表现
8.5 [8.5] 2504.08348 Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models
[{'name': 'Josef Bengtson, David Nilsson, Fredrik Kahl'}]
Image Generation 图像生成 v2
novel view synthesis
geometric consistency
diffusion models
Input: Single image and relative pose 单张图像和相对姿态
Step1: Generate candidate image 使用扩散模型生成候选图像
Step2: Compute matching points 计算匹配点
Step3: Formulate geometric consistency loss 构建几何一致性损失
Step4: Optimize noise to minimize loss 优化噪声以最小化损失
Output: Geometrically consistent image 输出: 几何一致性图像
8.5 [8.5] 2504.08414 Adversarial Examples in Environment Perception for Automated Driving (Review)
[{'name': 'Jun Yan, Huilin Yin'}]
Autonomous Systems and Robotics 自动驾驶 v2
adversarial examples
automated driving
adversarial robustness
Input: Overview of adversarial examples in deep learning applications for automated driving
Step1: Literature review of adversarial robustness and its methods
Step2: Analysis of adversarial impact on different tasks in automated driving
Step3: Discussion of future directions and research needs in adversarial robustness
Output: Comprehensive survey of adversarial examples in automated driving context
8.5 [8.5] 2504.08473 Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation
[{'name': 'Bram Vanherle, Brent Zoomers, Jeroen Put, Frank Van Reeth, Nick Michiels'}]
3D Reconstruction and Modeling 三维重建 v2
Gaussian Splatting
3D models
synthetic data generation
Input: Video of the target object 目标对象的视频
Step1: Training Gaussian Splatting model 训练高斯点云模型
Step2: Object extraction from video 从视频中提取对象
Step3: Rendering object onto background 物体渲染到背景上
Output: High-quality synthetic images 生成高质量合成图像
8.5 [8.5] 2504.08551 Shadow Erosion and Nighttime Adaptability for Camera-Based Automated Driving Applications
[{'name': 'Mohamed Sabry, Gregory Schroeder, Joshua Varughese, Cristina Olaverri-Monreal'}]
Image Generation 图像生成 v2
image enhancement
autonomous driving
shadow mitigation
nighttime visibility
Input: Images from RGB cameras RGB相机的图像
Step1: Apply Shadow Erosion to reduce shadows 应用阴影侵蚀以减少阴影
Step2: Implement Nighttime Adaptability for improved visibility 实施夜间适应性以提高可见性
Step3: Evaluate using visual perception quality metrics 使用视觉感知质量指标进行评估
Output: Enhanced images for autonomous driving applications 输出:用于自动驾驶应用的增强图像
8.5 [8.5] 2504.08581 FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents
[{'name': 'Xin Tan, Yuzhou Ji, He Zhu, Yuan Xie'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
3D scene modeling
language-embedded radiance fields
Input: Posed images 设定图像
Step1: Extract SAM masks 提取SAM掩码
Step2: Filter redundant masks 过滤冗余掩码
Step3: Semantic mapping 语义映射
Output: Part-level localization results 部件级定位结果
8.5 [8.5] 2504.08736 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
[{'name': 'Tianwei Xiong, Jun Hao Liew, Zilong Huang, Jiashi Feng, Xihui Liu'}]
Image Generation 图像生成 v2
image reconstruction
autoregressive generation
tokenizers
semantic regularization
Input: Visual tokenizers 视觉标记器
Step1: Identifying latent space complexity 确定潜在空间复杂性
Step2: Proposing semantic regularization 提出语义正则化
Step3: Scaling tokenizers with key practices 采用关键实践扩大标记器
Output: Enhanced image reconstruction and generation 改进的图像重建与生成
8.0 [8.0] 2504.08452 Road Grip Uncertainty Estimation Through Surface State Segmentation
[{'name': 'Jyri Maanp\"a\"a, Julius Pesonen, Iaroslav Melekhov, Heikki Hyyti, Juha Hyypp\"a'}]
Autonomous Driving 自动驾驶 v2
Grip Uncertainty Prediction 抓地力不确定性预测
Autonomous Driving 自动驾驶
Surface State Segmentation 表面状态分割
Input: Road surface state segmentation strategy 路面状态分割策略
Step1: Benchmark uncertainty prediction methods 基准不确定性预测方法
Step2: Estimate pixel-wise grip probability distribution 估计逐像素的抓地力概率分布
Step3: Evaluate robustness of predictions 评估预测的稳健性
Output: Enhanced grip uncertainty predictions 改进的抓地力不确定性预测
8.0 [8.0] 2504.08540 Datasets for Lane Detection in Autonomous Driving: A Comprehensive Review
[{'name': 'J\"org Gamerdinger, Sven Teufel, Oliver Bringmann'}]
Autonomous Systems and Robotics 自动驾驶 v2
lane detection
autonomous driving
datasets
Input: Lane detection datasets 车道检测数据集
Step1: Comprehensive review of datasets 数据集的综合评审
Step2: Classification based on key factors 基于关键因素的分类
Step3: Identification of challenges and gaps 挑战和研究空白的识别
Output: Recommendations for dataset improvement 数据集改进建议
7.5 [7.5] 2504.08422 CMIP-CIL: A Cross-Modal Benchmark for Image-Point Class Incremental Learning
[{'name': 'Chao Qi, Jianqin Yin, Ren Zhang'}]
Image and Video Generation 图像生成 v2
incremental learning
cross-modal learning
3D vision
Input: 2D images and 3D point clouds 2D图像和3D点云
Step1: Generating masked point clouds 生成遮罩点云
Step2: Creating multi-view images 生成多视图图像
Step3: Contrastive learning framework 对比学习框架
Output: Generalizable image-point correspondence 输出:可推广的图像点对应关系

Arxiv 2025-04-11

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.07335 DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates
[{'name': 'Akash Jadhav, Michael Greenspan'}]
3D Object Pose Estimation 物体姿态估计 v2
6DoF pose estimation
RGB-D images
3D surface estimation
Input: RGB-D images RGB-D图像
Step1: Predict per-pixel radial distances 为每个像素点预测径向距离
Step2: Use Direct Linear Transform for 3D surface estimation 使用直接线性变换进行3D表面估计
Step3: Keypoint ordering for symmetry handling 处理对称性关键点排序
Output: Accurate 6DoF object pose estimation 输出:准确的6自由度对象姿态估计
9.5 [9.5] 2504.07370 View-Dependent Uncertainty Estimation of 3D Gaussian Splatting
[{'name': 'Chenyu Han, Corentin Dumery'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
uncertainty estimation
Gaussian Splatting
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据
Step1: Model uncertainty as view-dependent features 将不确定性建模为视角依赖特征
Step2: Use spherical harmonics for uncertainty representation 使用球谐函数表示不确定性
Step3: Integrate into traditional 3DGS pipeline 集成到传统3DGS流程中
Output: Improved uncertainty estimation for 3D reconstruction 改进的3D重建不确定性估计
9.5 [9.5] 2504.07524 DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction
[{'name': 'Xu Zhao, Pengju Zhang, Bo Liu, Yihong Wu'}]
3D Occupancy Prediction 3D占用预测 v2
3D occupancy prediction 3D占用预测
autonomous driving 自动驾驶
depth context features 深度上下文特征
Input: 2D images and prior depth maps 2D图像和先前深度图
Step1: Extract depth context features 提取深度上下文特征
Step2: Develop the Global Query-based Module 开发全局查询模块
Step3: Apply Hierarchical Supervision Strategy 应用分层监督策略
Output: Monocular 3D occupancy predictions 单目3D占用预测
9.5 [9.5] 2504.07943 HoloPart: Generative 3D Part Amodal Segmentation
[{'name': 'Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, Xihui Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D part segmentation
shape completion
3D reconstruction
Input: Incomplete part segments 不完整的部分段
Step1: Initial part segmentation 初始部分分割
Step2: HoloPart diffusion-based model application 应用HoloPart扩散模型
Output: Complete 3D parts 完整的三维部分
9.5 [9.5] 2504.07958 Detect Anything 3D in the Wild
[{'name': 'Hanxue Zhang, Haoran Jiang, Qingsong Yao, Yanan Sun, Renrui Zhang, Hao Zhao, Hongyang Li, Hongzi Zhu, Zetong Yang'}]
3D Reconstruction and Modeling 三维重建 v2
3D detection
zero-shot learning
autonomous driving
Input: Monocular images 单目图像
Step1: Feature alignment 特征对齐
Step2: Knowledge transfer 知识转移
Step3: Model evaluation 模型评估
Output: Generalized 3D detection results 泛化的3D检测结果
9.5 [9.5] 2504.07961 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
[{'name': 'Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi'}]
3D Reconstruction and Modeling 三维重建 v2
4D reconstruction
monocular video
video generators
dynamic scenes
Input: Monocular video videos 单目视频
Step1: Train using synthetic data using video diffusion models 使用合成数据训练视频扩散模型
Step2: Predict geometric modalities including point, disparity, and ray maps 预测几何模型,包括点图、视差图和光线图
Step3: Multi-modal alignment and fusion at inference time 推理时的多模式对齐与融合
Output: 4D reconstruction of dynamic scenes 4D动态场景重建
9.2 [9.2] 2504.07853 V2V3D: View-to-View Denoised 3D Reconstruction for Light-Field Microscopy
[{'name': 'Jiayin Zhao, Zhenqi Fu, Tao Yu, Hui Qiao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
light field microscopy
denoising
wave optics
Input: Light field images 光场图像
Step1: Framework for simultaneous denoising and 3D reconstruction 同时去噪与三维重建框架
Step2: View-to-view paired images processing 视图对视图配对图像处理
Step3: Feature alignment using wave-optics-based technique 使用波光学基础上的特征对齐技术
Output: High-quality 3D reconstructed volumes 高质量3D重建体积
9.0 [9.0] 2504.07334 Objaverse++: Curated 3D Object Dataset with Quality Annotations
[{'name': 'Chendi Lin, Heshan Liu, Qunshu Lin, Zachary Bright, Shitao Tang, Yihui He, Minghao Liu, Ling Zhu, Cindy Le'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction 三维重建
quality annotation 质量注释
generative models 生成模型
Input: Annotated 3D object dataset 经过注释的三维物体数据集
Step1: Manual annotation of 10,000 objects 对1万件物品进行手动注释
Step2: Training of a neural network to automate tagging 训练神经网络以自动标记
Step3: Evaluation of datasets based on quality attributes 根据质量属性评估数据集
Output: Enhanced dataset of 500,000 3D models 改进的50万三维模型数据集
8.5 [8.5] 2504.07260 Quantifying Epistemic Uncertainty in Absolute Pose Regression
[{'name': 'Fereidoon Zangeneh, Amit Dekel, Alessandro Pieropan, Patric Jensfelt'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
absolute pose regression
visual localization
uncertainty estimation
Input: Image data 影像数据
Step1: Train absolute pose regression model 训练绝对姿态回归模型
Step2: Quantify epistemic uncertainty 量化认识不确定性
Step3: Validate predictions 验证预测
Output: Confidence measures in predictions 预测中的置信度量度
8.5 [8.5] 2504.07375 Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction
[{'name': 'Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Xieyuanli Chen, Hesheng Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D hand trajectory prediction
multimodal learning
robot manipulation
autonomous systems
Input: Multimodal data including 2D RGB images and 3D point clouds 多模态数据包含2D RGB图像和3D点云
Step1: Data processing to extract features from each modality 数据处理以提取每种模态的特征
Step2: Integration of multimodal features using a hybrid Mamba-Transformer module 使用混合Mamba-Transformer模块集成多模态特征
Step3: Prediction of future hand trajectories and camera egomotion 预测未来手部轨迹和相机自运动
Output: Future 3D hand trajectories and corresponding egomotion 未来的3D手部轨迹和相应的自运动
8.5 [8.5] 2504.07382 Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction
[{'name': 'Qingchao Jiang, Zhishuo Xu, Zhiying Zhu, Ning Chen, Haoyue Wang, Zhongjie Ba'}]
3D Reconstruction and Modeling 三维重建 v2
synthetic face detection
reconstruction discrepancies
Input: Multi-reconstruction of synthetic images 多重重建合成图像
Step1: Analyze reconstruction discrepancies 分析重建差异
Step2: Develop a Multi-Reconstruction-based detector 开发基于多重重建的检测器
Step3: Evaluate detection performance 评估检测性能
Output: Accurate differentiation between real and synthetic faces 输出: 精确区分真实和合成面孔
8.5 [8.5] 2504.07418 ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement
[{'name': 'Anning Hu, Ang Li, Xirui Jin, Danping Zou'}]
Stereo Vision 立体视觉 v2
thermal stereo matching
3D reconstruction
autonomous systems
Input: Rectified thermal stereo images 热成像立体图像
Step1: Feature extraction 特征提取
Step2: Cost volume construction 成本体积构建
Step3: Disparity estimation 视差估计
Step4: Disparity map refinement 视差图修正
Output: Final disparity map 最终视差图
8.5 [8.5] 2504.07491 Kimi-VL Technical Report
[{'name': 'Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haoning Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language model
multimodal reasoning
long-context processing
Input: Vision-language inputs 视觉-语言输入
Step1: MoE model design MoE模型设计
Step2: Long-context processing 长上下文处理
Step3: Multimodal reasoning 多模态推理
Output: Advanced VLM capabilities 高级VLM能力
8.5 [8.5] 2504.07603 RASMD: RGB And SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions
[{'name': 'Youngwan Jin, Michal Kovac, Yagiz Nalcakan, Hyeongjin Ju, Hanbin Song, Sanghyeop Yeo, Shiho Kim'}]
Autonomous Driving 自动驾驶 v2
RGB
SWIR
autonomous driving
dataset
object detection
Input: RGB and SWIR image pairs RGB和SWIR图像对
Step1: Dataset collection 数据集收集
Step2: Annotation for object detection and translation 对对象检测和翻译的注释
Step3: Experimental evaluation 实验评估
Output: Benchmark for multispectral driving dataset 多光谱驾驶数据集基准
8.5 [8.5] 2504.07615 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model
[{'name': 'Haozhan Shen, Peng Liu, Jingcheng Li, Chunxin Fang, Yibo Ma, Jiajia Liao, Qiaoli Shen, Zilun Zhang, Kangjia Zhao, Qianqian Zhang, Ruochen Xu, Tiancheng Zhao'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models (VLMs) 视觉语言模型
Reinforcement Learning 强化学习
Input: Vision-language tasks 视觉语言任务
Step1: Rule-based reward formulation 基于规则的奖励制定
Step2: Model training with reinforcement learning 使用强化学习进行模型训练
Step3: Performance evaluation in visual tasks 视觉任务中的性能评估
Output: Improved VLM performance 改进的视觉语言模型性能
8.5 [8.5] 2504.07949 InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians
[{'name': 'Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D modeling
hand-face interaction
Avatar
realistic animation
Gaussian Splatting
Input: Monocular or multi-view videos 单目或多视角视频
Step1: Gaussian kernel anchoring 高斯核锚定
Step2: Pose-dependent animation 动态动画依赖于姿势
Step3: Interaction modeling 互动建模
Output: Photorealistic avatar animation 照片级真实的虚拟人动画
8.5 [8.5] 2504.07955 BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
[{'name': 'Yuanhong Yu, Xingyi He, Chen Zhao, Junhao Yu, Jiaqi Yang, Ruizhen Hu, Yujun Shen, Xing Zhu, Xiaowei Zhou, Sida Peng'}]
3D Reconstruction and Modeling 三维重建 v2
3D object pose estimation 3D物体姿态估计
sparse-view reconstruction 稀疏视图重建
Input: Sparse-view RGB images 稀疏视图RGB图像
Step1: Recover 3D bounding box from sparse views 从稀疏视图恢复3D边界框
Step2: Predict 2D projections of the bounding box corners in the query view 在查询视图中预测边界框角点的2D投影
Output: 6DoF object pose estimation 6自由度物体姿态估计
7.5 [7.5] 2504.07542 SydneyScapes: Image Segmentation for Australian Environments
[{'name': 'Hongyu Lyu, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}]
Autonomous Driving 自动驾驶 v2
image segmentation
autonomous vehicles
dataset
machine learning
Input: Collection of urban images from Sydney 澳大利亚悉尼的城市图像
Step1: Image segmentation task definition 图像分割任务定义
Step2: Annotation of images with semantic, instance, and panoptic labels 等图像进行语义、实例和全景标签注释
Step3: Benchmarking with state-of-the-art algorithms 基于最新算法进行基准测试
Output: Dataset for AV perception algorithm development 自动驾驶感知算法开发的数据集

Arxiv 2025-04-10

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.06716 GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction
[{'name': 'Anil Armagan, Albert Sa\`a-Garriga, Bruno Manganelli, Kyuwon Kim, M. Kerim Yucel'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
efficiency
autonomous driving
Input: Monocular images 单目图像
Step1: Gaussian identification 高斯识别
Step2: Freezing converged Gaussians 冻结收敛高斯
Step3: Early stopping mechanism 提早停止机制
Output: Efficiently trained 3D reconstruction model 高效训练的3D重建模型
9.5 [9.5] 2504.06719 Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
[{'name': 'Pedro Hermosilla, Christian Stippel, Leon Sick'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene understanding 3D场景理解
self-supervised learning 自监督学习
masked modeling 掩模建模
Input: Hierarchical 3D models 分层的3D模型
Step1: Multi-resolution feature sampling 多分辨率特征采样
Step2: Hierarchical masking approach 分层掩码方法
Step3: Feature reconstruction 特征重建
Output: Semantic-aware 3D features 语义感知的3D特征
9.5 [9.5] 2504.06801 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection
[{'name': 'Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Kundu, R. Venkatesh Babu'}]
3D Reconstruction and Modeling 三维重建 v2
3D object detection 3D物体检测
data augmentation 数据增广
monocular detection 单目检测
Input: Background scene 背景场景
Step1: Learn distribution of plausible 3D bounding boxes 学习合理的三维边界框的分布
Step2: Render realistic objects 渲染真实的物体
Step3: Place objects according to learned distribution 根据学习的分布放置物体
Output: Enhanced monocular 3D detection performance 改进的单目3D检测性能
9.5 [9.5] 2504.06815 SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering
[{'name': 'Hanxiao Sun, YuPeng Gao, Jin Xie, Jian Yang, Beibei Wang'}]
3D Reconstruction 三维重建 v2
inverse rendering
3D Gaussian Splatting
novel view synthesis
relighting
Input: Images for 3D asset reconstruction 用于三维资产重建的图像
Step1: Apply Spatially-varying Gaussian representation 应用空间变化高斯表示
Step2: Integrate physically-based indirect lighting model 集成基于物理的间接照明模型
Step3: Evaluate NVS and relighting quality 评估新视角合成和重光照优化
Output: Enhanced rendering quality 改进的渲染质量
9.5 [9.5] 2504.06827 IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
[{'name': 'Can Zhang, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Reconstruction 三维重建
Interactive Affordance 交互赋能
Articulated Objects 有关节的物体
Input: Multi-view posed images 多视角图像
Step1: 3D model construction 3D模型构建
Step2: Hierarchical feature field construction 层次特征场构建
Step3: Semantic-guided mask association across states 语义引导的掩码关联
Step4: Affordance prediction 赋能预测
Step5: Motion recovery 运动恢复
Output: Interactive affordance system 可交互的赋能系统
9.5 [9.5] 2504.06978 Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting
[{'name': 'Daiwei Zhang, Joaquin Gajardo, Tomislav Medic, Isinsu Katircioglu, Mike Boss, Norbert Kirchgessner, Achim Walter, Lukas Roth'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
instance segmentation
phenotyping
Gaussian Splatting
Input: Multi-view RGB images 多视角RGB图像
Step1: Data integration 数据集成
Step2: Instance segmentation using Segment Anything Model (SAM) 基于SAM的实例分割
Step3: 3D reconstruction using 3D Gaussian Splatting 采用3D高斯点云进行3D重建
Output: Detailed 3D models of wheat heads 改进的小麦头三维模型
9.5 [9.5] 2504.06982 SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets
[{'name': 'Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
3D human generation
Gaussian modeling
Input: Multi-view images 多视角图像
Step1: Latent space compression 潜在空间压缩
Step2: Gaussian representation generation 高斯表示生成
Step3: Large-scale dataset construction 大规模数据集构建
Output: High-quality 3D human Gaussians 高质量3D人类高斯模型
9.5 [9.5] 2504.07025 Glossy Object Reconstruction with Cost-effective Polarized Acquisition
[{'name': 'Bojian Wu, Yifan Peng, Ruizhen Hu, Xiaowei Zhou'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
polarization imaging
neural rendering
Input: Multi-view polarization images 多视角偏振图像
Step1: Data acquisition 数据采集
Step2: Modeling polarimetric BRDF using neural implicit fields 使用神经隐式场建模偏振BRDF
Step3: Minimizing rendering loss 最小化渲染损失
Output: High-fidelity geometry and radiance decomposition 高保真几何体和辐射分解
9.2 [9.2] 2504.06397 PromptHMR: Promptable Human Mesh Recovery
[{'name': 'Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas'}]
3D Reconstruction and Modeling 三维重建 v2
human pose estimation
3D shape recovery
Input: Images containing people 处理包含人物的图像
Step1: Utilize bounding boxes or masks 利用边界框或掩模
Step2: Extract features using vision transformer 使用视觉变换器提取特征
Step3: Process prompts and image data 处理提示和图像数据
Output: Estimated human pose and shape 估计的人体姿态和形状
8.5 [8.5] 2504.06292 Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction
[{'name': 'Hongbin Liang, Hezhe Qiao, Wei Huang, Qizhou Wang, Mingsheng Shang, Lin Chen'}]
Autonomous Systems and Robotics 自动驾驶 v2
pedestrian crossing intention
autonomous driving
temporal contextual learning
Input: Observed video frames 观察到的视频帧
Step1: Temporal merging to cluster key events 时间聚合以聚类关键事件
Step2: Contextual attention to aggregate features 上下文注意力以聚合特征
Output: Enhanced pedestrian crossing intent prediction 改进的行人过马路意图预测
8.5 [8.5] 2504.06464 Implementation of a Zed 2i Stereo Camera for High-Frequency Shoreline Change and Coastal Elevation Monitoring
[{'name': "Jos\'e A. Pilartes-Congo, Matthew Kastl, Michael J. Starek, Marina Vicens-Miquel, Philippe Tissot"}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
coastal monitoring
Input: Multi-view images 多视角图像
Step1: Intrinsic camera calibration 相机内在参数校准
Step2: Georectification and registration of acquired imagery and point cloud 获取图像和点云的几何校正与配准
Step3: Generation of Digital Surface Models (DSM) 生成数字表面模型(DSM)
Output: 3D point cloud and georectified imagery 3D点云和几何校正图像
8.5 [8.5] 2504.06527 TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis
[{'name': 'Xinyu Liu, Xiaoguang Lin, Xiang Liu, Yong Yang, Hongqian Wang, Qilong Sun'}]
Multi-view and Stereo Vision 多视角立体视觉 v2
multi-viewpoint camera selection
surgical video analysis
Input: Multi-view surgical videos 多视角手术视频
Step1: Feature extraction features extraction 特征提取
Step2: Time series prediction 时间序列预测
Step3: Camera selection camera selection 相机选择
Output: Optimal camera views 最优相机视角
8.5 [8.5] 2504.06620 InstantSticker: Realistic Decal Blending via Disentangled Object Reconstruction
[{'name': 'Yi Zhang, Xiaoyang Huang, Yishun Dou, Yue Shi, Rui Shi, Ye Chen, Bingbing Ni, Wenjun Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
decal blending
3D reconstruction
real-time rendering
Input: Multi-view images 多视角图像
Step1: Decal blending preparation 贴花合成准备
Step2: Shadow factor integration 阴影因子集成
Step3: ARAP parameterization 优化参数化
Output: High-quality decal blending outputs 高质量贴花合成结果
8.5 [8.5] 2504.06627 FACT: Multinomial Misalignment Classification for Point Cloud Registration
[{'name': "Ludvig Dill\'en, Per-Erik Forss\'en, Johan Edstedt"}]
Point Cloud Processing 点云处理 v2
Point Cloud Registration 点云注册
Alignment Quality Prediction 对齐质量预测
Multinomial Misalignment Classification 多项式对齐分类
Input: Registered lidar point cloud pairs 注册的激光雷达点云对
Step1: Feature extraction 特征提取
Step2: Processing with point transformer-based network 使用基于点的变换网络进行处理
Step3: Multinomial misalignment classification 多项式对齐分类
Output: Misalignment class prediction 预测对齐误差类别
8.5 [8.5] 2504.06638 HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network
[{'name': 'Hu Cui, Tessai Hayama'}]
3D Reconstruction and Modeling 三维重建 v2
3D human pose estimation
Hyper-GCN
Mamba networks
Input: 2D human pose data 2D 人体姿态数据
Step1: Model local structures 模型局部结构
Step2: Model global dependencies 模型全局依赖
Step3: Adaptive fusion 自适应融合
Output: 3D human pose estimates 3D 人体姿态估计
8.5 [8.5] 2504.06647 Uni-PrevPredMap: Extending PrevPredMap to a Unified Framework of Prior-Informed Modeling for Online Vectorized HD Map Construction
[{'name': 'Nan Peng, Xun Zhou, Mingming Wang, Guisong Chen, Songming Chen'}]
Autonomous Systems and Robotics 自动驾驶 v2
autonomous driving
HD maps
Input: Previous predictions and simulated outdated HD maps 先前预测和模拟过时的高清地图
Step1: Framework development 框架设计
Step2: Efficient data processing and retrieval 效率数据处理与检索
Step3: Model validation and performance evaluation 模型验证与性能评估
Output: Enhanced online vectorized HD maps 改进的在线矢量高清地图
8.5 [8.5] 2504.06742 nnLandmark: A Self-Configuring Method for 3D Medical Landmark Detection
[{'name': 'Alexandra Ertl, Shuhan Xiao, Stefan Denner, Robin Peretzke, David Zimmerer, Peter Neher, Fabian Isensee, Klaus Maier-Hein'}]
3D Reconstruction and Modeling 三维重建 v2
3D landmark detection
nnU-Net
medical imaging
Input: 3D medical images 3D医学图像
Step1: Adapt nnU-Net for landmarks 使用nnU-Net进行地标适配
Step2: Perform heatmap-based regression 进行基于热图的回归
Step3: Model evaluation and validation 模型评估与验证
Output: Accurate 3D landmark detection 准确的3D地标检测
8.5 [8.5] 2504.06803 DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
[{'name': 'Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Hao Luo, Yibing Song, Gao Huang, Fan Wang, Yang You'}]
Image and Video Generation 图像和视频生成 v2
visual generation 视觉生成
Diffusion Transformers 扩散变换器
computational efficiency 计算效率
Input: Visual generation tasks 视觉生成任务
Step1: Dynamic computation adjustment 动态计算调整
Step2: Implementing TDW and SDT strategies 实施TDW和SDT策略
Step3: Integrating with existing diffusion models 与现有扩散模型的整合
Output: Efficient visual generation efficient visual generation
8.5 [8.5] 2504.06863 MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking
[{'name': 'Chang Nie, Yiqing Xu, Guangming Wang, Zhe Liu, Yanzi Miao, Hesheng Wang'}]
Robotic Perception 机器人感知 v2
Moving Object Segmentation
Deep Learning
Autonomous Driving
Machine Learning
Input: Single images 单幅图像
Step1: Generate text prompts from Multimodal Large Language Model (MLLM) 使用多模态大型语言模型生成文本提示
Step2: Segment moving objects using Segment Anything Model (SAM) and Vision-Language Model (VLM) 使用SAM和VLM进行移动对象分割
Step3: Implement a deep thinking loop to refine segmentation results 实施深度思维循环以优化分割结果
Output: Segmented moving objects 输出分割的移动对象
8.5 [8.5] 2504.06920 S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications
[{'name': "Masquil El\'ias, Mar\'i Roger, Ehret Thibaud, Meinhardt-Llopis Enric, Mus\'e Pablo, Facciolo Gabriele"}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
shadow detection
remote sensing
Input: Multi-date, multi-angle satellite imagery 多日期多角度卫星影像
Step1: Data collection and annotation 数据收集与注释
Step2: Training of shadow detection model 阴影检测模型训练
Step3: Integration with 3D reconstruction models 与三维重建模型集成
Output: Improved shadow detection and 3D model quality 改进的阴影检测和三维模型质量
8.5 [8.5] 2504.06925 Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition
[{'name': "Sergio Romero-Tapiador, Ruben Tolosana, Blanca Lacruz-Pleguezuelos, Laura Judith Marcos Zambrano, Guadalupe X. Baz\'an, Isabel Espinosa-Salinas, Julian Fierrez, Javier Ortega-Garcia, Enrique Carrillo de Santa Pau, Aythami Morales"}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
food image recognition
dietary assessment
Input: Food images 食品图像
Step1: Database creation 数据库创建
Step2: Model evaluation 模型评估
Step3: Comparison of VLMs with expert annotations 与专家注释的VLM比较
Output: Food recognition results 食品识别结果
8.5 [8.5] 2504.07093 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution
[{'name': 'Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec'}]
Depth Estimation 深度估计 v2
depth estimation
real-time processing
video analysis
Input: Streaming video at 2K resolution 2K分辨率视频
Step1: Preprocess video frames 预处理视频帧
Step2: Depth estimation using modified pretrained model 使用修改后的预训练模型进行深度估计
Step3: Alignment of depth features 对深度特征进行对齐
Output: High-resolution depth maps 输出高分辨率深度图
7.0 [7.0] 2504.06835 LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
[{'name': 'Ziyi Wang, Haoran Wu, Yiming Rong, Deyang Jiang, Yixin Zhang, Yunlong Zhao, Shuang Xu, Bo XU'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
long video understanding
Input: Short video-text pairs 短视频-文本对
Step1: Video compression video compression 视频压缩
Step2: Model enhancement 模型增强
Output: Improved VLM performance 改进的VLM性能

Arxiv 2025-04-09

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.05400 GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
[{'name': 'Sihang Li, Zeyu Jiang, Grace Chen, Chenyang Xu, Siqi Tan, Xue Wang, Irving Fang, Kristof Zyskowski, Shannon P. McPherron, Radu Iovita, Chen Feng, Jing Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reassembly 三维重组
fracture 断裂
dataset 数据集
Input: Various fractured 3D objects 各种破碎的三维物体
Step1: Fracture-aware feature learning 破碎感知特征学习
Step2: Flow matching for alignment 对齐的流匹配
Step3: One-step preassembly for robustness 一步预组装以提高鲁棒性
Output: Reassembled 3D models 重新组装的三维模型
9.5 [9.5] 2504.05649 POD: Predictive Object Detection with Single-Frame FMCW LiDAR Point Cloud
[{'name': 'Yining Shi, Kun Jiang, Xin Zhao, Kangan Qian, Chuchu Xie, Tuopu Wen, Mengmeng Yang, Diange Yang'}]
3D Object Detection 3D物体检测 v2
3D object detection
FMCW LiDAR
autonomous driving
Input: Single-frame FMCW LiDAR point cloud
Step1: Generate virtual future point using ray casting
Step2: Create virtual two-frame point clouds
Step3: Encode with a sparse 4D encoder
Output: Predictive object detection results
9.5 [9.5] 2504.05698 Point-based Instance Completion with Scene Constraints
[{'name': 'Wesley Khademi, Li Fuxin'}]
3D Reconstruction and Modeling 三维重建 v2
point cloud
3D reconstruction
scene completion
autonomous systems
Input: Partial point clouds of objects 场景中物体的部分点云
Step1: Seed generation 生成种子点
Step2: Scene constraints integration 场景约束集成
Step3: Instance completion 模型完成
Output: Completed 3D objects 完成的三维对象
9.5 [9.5] 2504.05720 QEMesh: Employing A Quadric Error Metrics-Based Representation for Mesh Generation
[{'name': 'Jiaqi Li, Ruowei Wang, Yu Liu, Qijun Zhao'}]
3D Generation 三维生成 v2
3D reconstruction
mesh generation
Quadric Error Metrics
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Model evaluation 模型评估
Output: Enhanced 3D models 改进的三维模型
9.5 [9.5] 2504.05751 InvNeRF-Seg: Fine-Tuning a Pre-Trained NeRF for 3D Object Segmentation
[{'name': 'Jiangsan Zhao, Jakob Geipel, Krzysztof Kusnierek, Xuean Cui'}]
3D Segmentation 3D分割 v2
Neural Radiance Fields
3D segmentation
fine-tuning
Input: Multi-view RGB images and 2D segmentation masks 多视角RGB图像和2D分割掩膜
Step1: Train standard NeRF on RGB images 使用RGB图像训练标准NeRF
Step2: Fine-tune using 2D segmentation masks using the same NeRF architecture 使用相同的NeRF架构对2D分割掩膜进行微调
Output: Segmented 3D point clouds 输出:分割的3D点云
9.5 [9.5] 2504.06178 Flash Sculptor: Modular 3D Worlds from Objects
[{'name': 'Yujia Hu, Songhua Liu, Xingyi Yang, Xinchao Wang'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
scene generation
modular objects
image-to-3D
Input: Single image 单幅图像
Step1: Decouple tasks 任务分解
Step2: Estimate parameters 估计参数
Step3: Generate 3D scene 生成三维场景
Output: Compositional 3D scene 组合三维场景
9.5 [9.5] 2504.06210 HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
[{'name': 'Yiming Liang, Tianhan Xu, Yuta Kikuchi'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
monocular videos
Input: Monocular video 单目视频
Step1: Motion decomposition 运动分解
Step2: Hierarchical representation design 层次表示设计
Step3: Gaussian deformation adjustment 高斯变形调整
Output: Enhanced dynamic 3D model 改进的动态三维模型
9.5 [9.5] 2504.06264 D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes
[{'name': 'Jisang Han, Honggyu An, Jaewoo Jung, Takuya Narihira, Junyoung Seo, Kazumi Fukuda, Chaehyun Kim, Sunghwan Hong, Yuki Mitsufuji, Seungryong Kim'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
dynamic scenes
4D pointmaps
Input: Multi-view images and dynamic scene data 多视角图像和动态场景数据
Step1: Regressing 4D pointmaps 回归4D点图
Step2: Establishing dense correspondences 进行密集对应
Step3: Model training with temporal awareness 模型训练与时间意识
Output: Enhanced 3D reconstruction 模型改进的三维重建
9.2 [9.2] 2504.06003 econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
[{'name': 'Can Zhang, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D semantic segmentation 3D语义分割
multi-view consistency 多视角一致性
open-vocabulary segmentation 开放词汇分割
Input: Multi-view images 多视角图像
Step1: Data refinement using Confidence-region Guided Regularization (CRR) 使用信心区域引导正则化进行数据细化
Step2: Constructing a low-dimensional contextual space 创建低维上下文空间
Step3: Fusing backprojected multi-view features 融合反投影的多视角特征
Output: 3D semantic field representation 3D语义场表示
8.5 [8.5] 2504.05422 EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations
[{'name': 'Yue Yao, Mohamed-Khalil Bouzidi, Daniel Goehring, Joerg Reichardt'}]
Autonomous Systems and Robotics 自动驾驶 v2
traffic scene generation
autonomous vehicles
generative models
Input: Road layout and agent history 路布局和代理历史
Step1: Model designing using polynomial representations 模型设计,使用多项式表示
Step2: Training the diffusion-based generative model 训练基于扩散的生成模型
Step3: Evaluating traffic scene predictions 评估交通场景预测
Output: Diverse and plausible traffic scene continuations 生成多样和合理的交通场景延续
8.5 [8.5] 2504.05579 TAPNext: Tracking Any Point (TAP) as Next Token Prediction
[{'name': 'Artem Zholus, Carl Doersch, Yi Yang, Skanda Koppula, Viorica Patraucean, Xu Owen He, Ignacio Rocco, Mehdi S. M. Sajjadi, Sarath Chandar, Ross Goroshin'}]
3D Reconstruction and Modeling 三维重建 v2
point tracking
3D reconstruction
robotics
Input: Video frames 视频帧
Step 1: Point tracking point tracking 追踪
Step 2: Token decoding 令牌解码
Step 3: Model evaluation 模型评估
Output: Accurate point tracks 准确的点轨迹
8.5 [8.5] 2504.05786 How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM
[{'name': 'Jirong Zha, Yuxuan Fan, Xiao Yang, Chen Gao, Xinlei Chen'}]
3D Reconstruction and Modeling 三维重建 v2
3D spatial understanding
Large Language Models
multimodal fusion
autonomous vehicles
robotics
Input: Integration of Large Language Models (LLMs) with 3D spatial understanding
Step1: Categorization into image-based, point cloud-based, and hybrid modality methods
Step2: Systematic review of existing research methods
Step3: Discussion on limitations and future directions
Output: Comprehensive framework for 3D-LLM integration
8.5 [8.5] 2504.05882 Turin3D: Evaluating Adaptation Strategies under Label Scarcity in Urban LiDAR Segmentation with Semi-Supervised Techniques
[{'name': 'Luca Barco, Giacomo Blanco, Gaetano Chiriaco, Alessia Intini, Luigi La Riccia, Vittorio Scolamiero, Piero Boccardo, Paolo Garza, Fabrizio Dominici'}]
3D Semantic Segmentation 三维语义分割 v2
3D segementation
LiDAR
urban modeling
Input: Aerial LiDAR data 空中激光雷达数据
Step1: Dataset collection 数据集收集
Step2: Performance benchmarking 性能基准测试
Step3: Semi-supervised learning application 半监督学习应用
Output: Improved 3D semantic segmentation results 改进的3D语义分割结果
8.5 [8.5] 2504.05908 PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario
[{'name': 'Sriram Mandalika, Lalitha V, Athira Nambiar'}]
Autonomous Driving 自动驾驶 v2
3D object detection
autonomous driving
uncertainty-aware modeling
Input: LiDAR-based 3D object detection and multi-view RGB references
Step1: Model Training with Bayesian Graph Neural Networks (BGNNs)
Step2: Uncertainty modeling for object interactions
Step3: Evaluation on DriveCoT dataset
Output: Enhanced decision-making under uncertainty
8.0 [8.0] 2504.05458 Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images
[{'name': 'In-Hwan Jin, Haesoo Choo, Seong-Hun Jeong, Heemoon Park, Junghwan Kim, Oh-joon Kwon, Kyeongbo Kong'}]
3D Reconstruction and Modeling 三维重建 v2
3D space virtualization 3D空间虚拟化
dynamic scene video 动态场景视频
Input: Single landscape image 单个景观图像
Step1: Generate multi-view images 生成多视角图像
Step2: Optimize 3D Gaussians 优化3D高斯
Step3: Estimate consistent 3D motion 估计一致的3D运动
Output: Dynamic scene video 动态场景视频
8.0 [8.0] 2504.05979 An Empirical Study of GPT-4o Image Generation Capabilities
[{'name': 'Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, Lu Qi'}]
Image Generation 图像生成 v2
image generation
multimodal models
GPT-4o
image-to-3D generation
Input: Generative models and tasks 生成模型和任务
Step1: Evaluation against existing models 与现有模型的评估
Step2: Benchmarking across categories 在各类任务中的基准测试
Step3: Comparative analysis of strengths and limitations 优势和局限性的比较分析
Output: Comprehensive evaluation results 综合评估结果
7.5 [7.5] 2504.05402 Time-adaptive Video Frame Interpolation based on Residual Diffusion
[{'name': 'Victor Fonte Chavez, Claudia Esteves, Jean-Bernard Hayet'}]
Image and Video Generation 图像生成与视频生成 v2
Video Frame Interpolation
Diffusion Models
Animation
Input: Animation frames 动画帧
Step1: Time handling during training 训练过程中的时间处理
Step2: Adapt diffusion scheme for VFI 适应扩散方案用于视频帧插值
Step3: Uncertainty estimation 不确定性估计
Output: Interpolated video frames 插值视频帧
6.5 [6.5] 2504.05456 Generative Adversarial Networks with Limited Data: A Survey and Benchmarking
[{'name': 'Omar De Mitri, Ruyu Wang, Marco F. Huber'}]
Image Generation 图像生成 v2
Generative Adversarial Networks
Limited Data
Image Synthesis
Generative Models
Input: Limited datasets 限量数据
Step 1: Literature review 文献综述
Step 2: Performance evaluation 性能评估
Step 3: Challenge identification 挑战识别
Output: Insights on GAN performance 生成对抗网络性能见解

Arxiv 2025-04-08

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.03875 3D Scene Understanding Through Local Random Access Sequence Modeling
[{'name': 'Wanhee Lee, Klemen Kotar, Rahul Mysore Venkatesh, Jared Watrous, Honglin Chen, Khai Loong Aw, Daniel L. K. Yamins'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene understanding
novel view synthesis
depth estimation
Input: Single images 单幅图像
Step1: Local patch quantization 局部图块量化
Step2: Randomly ordered sequence generation 随机顺序生成
Step3: 3D scene editing via optical flow 通过光流进行三维场景编辑
Output: Enhanced capabilities for 3D scene understanding 改进的三维场景理解能力
9.5 [9.5] 2504.03886 WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments
[{'name': 'Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
3D reconstruction 三维重建
dynamic environments 动态环境
SLAM 同时定位与地图构建
Input: Monocular video sequence 单目视频序列
Step1: Generate uncertainty map 生成不确定性地图
Step2: Dynamic object removal 动态物体移除
Step3: Dense bundle adjustment and Gaussian map optimization 密集束调整与高斯地图优化
Output: 3D Gaussian map and camera trajectory 3D高斯地图和相机轨迹
9.5 [9.5] 2504.04190 Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning
[{'name': 'Yuyang Zhang, Baao Xie, Hu Zhu, Qi Wang, Huanting Guo, Xin Jin, Wenjun Zeng'}]
3D Reconstruction 三维重建 v2
3D reconstruction
Gaussian Splatting
interpretability
disentangled representation learning
single-view
Input: Single-view images 单视角图像
Step1: Data integration 数据集成
Step2: Hierarchical disentangled representation learning (DRL) 层次化解耦表征学习
Step3: 3D geometry and appearance disentanglement 3D几何和外观解耦
Output: Interpretable and high-quality 3D models 可解释的高质量3D模型
9.5 [9.5] 2504.04294 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS
[{'name': 'Zhisheng Huang, Peng Wang, Jingdong Zhang, Yuan Liu, Xin Li, Wenping Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
Structure-from-Motion
camera pose optimization
Input: 3D Gaussian representations and camera poses 3D高斯表示和相机姿态
Step1: Joint optimization of 3D Gaussians and camera parameters 联合优化3D高斯和相机参数
Step2: Implement 3DGS-MCMC for robustness 对3DGS-MCMC实施以增强鲁棒性
Step3: Use an MLP for camera pose refinement 使用多层感知机(MLP)进行相机姿态优化
Output: High-quality novel views and accurate camera poses 输出:高质量的新视图和准确的相机姿态
9.5 [9.5] 2504.04448 Thermoxels: a voxel-based method to generate simulation-ready 3D thermal models
[{'name': 'Etienne Chassaing, Florent Forest, Olga Fink, Malcolm Mielle'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
finite element analysis
thermal modeling
voxel-based modeling
Input: Sparse RGB and thermal images RGB和热图像作为输入
Step1: Voxel representation 体素表示
Step2: Geometry and temperature optimization 几何和温度优化
Step3: Model transformation to tetrahedral meshes 将模型转换为四面体网格
Output: FEA-compatible 3D models 输出: 兼容FEA的3D模型
9.5 [9.5] 2504.04454 PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation
[{'name': 'Lei Cheng, Mahdi Saleh, Qing Cheng, Lu Sang, Hongli Xu, Daniel Cremers, Federico Tombari'}]
3D Reconstruction and Modeling 三维重建 v2
3D shape generation
Statistical Shape Models (SSM)
Gaussian Mixture Models (GMM)
Input: Real-world objects 真实物体
Step1: Integration of Statistical Shape Models (SSM) and Gaussian Mixture Models (GMM) 整合统计形状模型和高斯混合模型
Step2: Application of categorical diffusion models 应用类别扩散模型
Step3: Shape generation and manipulation shapes 生成和操作形状
Output: High-fidelity, structurally coherent 3D shapes 高保真、结构一致的三维形状
9.5 [9.5] 2504.04597 Targetless LiDAR-Camera Calibration with Anchored 3D Gaussians
[{'name': 'Haebeom Jung, Namtae Kim, Jungwoo Kim, Jaesik Park'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR-camera calibration
3D Gaussian
autonomous driving
Input: LiDAR and camera data 激光雷达与相机数据
Step1: Freeze reliable LiDAR points as anchors 固定可靠的激光雷达点作为锚点
Step2: Jointly optimize sensor poses and Gaussian parameters 联合优化传感器姿态和高斯参数
Step3: Evaluate using photometric loss 和光度损失进行评估
Output: Improved calibration poses 改进的标定姿态
9.5 [9.5] 2504.04679 DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal
[{'name': 'Wanzhou Liu, Zhexiao Xiong, Xinyu Li, Nathan Jacobs'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
occlusion removal
Neural Radiance Fields
Input: Incomplete images 不完整图像
Step1: Joint multi-view optimization of learnable camera parameters 学习相机参数的多视角联合优化
Step2: Application of occlusion annealing regularization 应用遮挡退火正则化
Step3: Use of stochastic structural similarity loss 使用随机结构相似性损失
Output: High-quality 3D scene reconstructions 高质量的三维场景重建
9.5 [9.5] 2504.04732 Inverse++: Vision-Centric 3D Semantic Occupancy Prediction Assisted with 3D Object Detection
[{'name': 'Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}]
3D Reconstruction and Modeling 三维重建 v2
3D semantic occupancy prediction 3D语义占用预测
autonomous vehicles 自动驾驶
3D object detection 3D物体检测
Input: Surround-view images 360度视图图像
Step1: Introduce 3D object detection auxiliary branch 引入3D物体检测辅助分支
Step2: Enhance intermediate feature supervision 增强中间特征监督
Step3: Generate 3D semantic occupancy grid 生成3D语义占用网格
Output: Improved 3D perception capabilities 改进的3D感知能力
9.5 [9.5] 2504.05170 SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection
[{'name': 'Bonan Ding, Jin Xie, Jing Nie, Jiale Cao'}]
3D Object Detection 三维物体检测 v2
3D object detection
feature fusion
autonomous systems
Input: Multi-modal data (LiDAR and camera images) 输入: 多模态数据(激光雷达和摄像机图像)
Step1: Feature extraction 特征提取
Step2: Scale-aligned feature fusion 按尺度对齐的特征融合
Step3: 3D-to-2D space alignment 3D到2D空间对齐
Step4: Cross-modal latent fusion 跨模态潜变量融合
Output: Accurate 3D object detection results 输出: 精确的3D物体检测结果
9.5 [9.5] 2504.05249 Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images
[{'name': 'Wenzhao Tang, Weihang Li, Xiucheng Liang, Olaf Wysocki, Filip Biljecki, Christoph Holst, Boris Jutzi'}]
3D Reconstruction and Modeling 三维重建 v2
3D building reconstruction
LoD3
panoramic images
semantic segmentation
Input: Panoramic street-level images 全景街景图像
Step1: Image-to-object matching 图像与对象匹配
Step2: 3D model B-Rep surface simplification 3D模型边界表示表面简化
Step3: Ortho-rectification of images 图像正射校正
Step4: Facade segmentation facade segmentation
Output: Enhanced Level of Detail 3D building models 改进的细节层次(LoD) 3D建筑模型
9.2 [9.2] 2504.03868 Control Map Distribution using Map Query Bank for Online Map Generation
[{'name': 'Ziming Liu, Leichen Wang, Ge Yang, Xinrun Li, Xingtao Hu, Hao Sun, Guangyu Gao'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
High-definition maps 高清地图
Online map generation 在线地图生成
Autonomous driving 自动驾驶
Transformers 变换器
Input: Low-cost standard definition map data (SD map) 标准定义地图数据
Step1: Map query bank decomposition 地图查询银行分解
Step2: Initial distribution generation for scenarios 场景的初始分布生成
Step3: Map predictions optimization 地图预测优化
Output: Optimized HD maps 优化的高清地图
9.2 [9.2] 2504.05303 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
[{'name': "Sai Kumar Dwivedi, Dimitrije Anti\'c, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas"}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
human-object interaction
Vision-Language Models
Input: In-the-wild images 在野外图像
Step1: Multi-view rendering 多视角渲染
Step2: 2D contact mask prediction 2D接触掩膜预测
Step3: 3D lifting of contact points 3D接触点提升
Output: 3D contact points 3D接触点
9.0 [9.0] 2504.04457 VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets
[{'name': 'Alejandro Fontan, Tobias Fischer, Javier Civera, Michael Milford'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
Visual SLAM
benchmarking
robotics
Input: VSLAM algorithms and datasets
Step1: Standardization of datasets and evaluation metrics
Step2: Automation of dataset downloading and preprocessing
Step3: Streamlined configuration and execution of experiments
Output: Efficient benchmarking of VSLAM systems
9.0 [9.0] 2504.04753 CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
[{'name': 'Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, Fayao Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
CAD models
image generation
multi-view images
geometric features
Input: Unconstrained real-world CAD images 非约束的现实世界CAD图像
Step1: Geometry encoding 几何编码
Step2: Latent diffusion modeling 潜在扩散建模
Step3: Code checking for validity 代码有效性检查
Output: Generated parametric CAD models 生成的参数化CAD模型
8.5 [8.5] 2504.04124 EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection
[{'name': 'Muhammad Ahmed Ullah Khan, Abdul Hannan Khan, Andreas Dengel'}]
Autonomous Driving 自动驾驶 v2
event-based detection
autonomous driving
object detection
Input: Event camera data 事件相机数据
Step1: Develop Event Progression Extractor module 开发事件进展提取模块
Step2: Implement Metaformer architecture 实现Metaformer架构
Step3: Evaluate on traffic object detection benchmarks 在交通物体检测基准上进行评估
Output: Efficient traffic object detection model 高效的交通物体检测模型
8.5 [8.5] 2504.04158 JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
[{'name': 'Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Yeying Jin, Wenbo Li, Xinghao Ding'}]
Autonomous Systems and Robotics 自动驾驶 v2
autonomous driving
image restoration
perception systems
vision-language models
Input: Real-world degraded images 真实世界退化图像
Step1: Model integration 模型集成
Step2: Two-stage framework development 二阶段框架开发
Step3: Evaluation on CleanBench dataset 在CleanBench数据集上评估
Output: Enhanced perception metrics 改进的感知指标
8.5 [8.5] 2504.04348 OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning
[{'name': 'Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D driving
vision-language models
Input: 3D driving tasks and vision-language dataset 3D驾驶任务和视觉语言数据集
Step 1: Data generation using counterfactual reasoning 基于反事实推理的数据生成
Step 2: Framework evaluation with Omni-L and Omni-Q Omni-L与Omni-Q的框架评估
Output: Improved decision-making models 改进的决策模型
8.5 [8.5] 2504.04540 The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
[{'name': 'Weichen Zhang, Ruiying Peng, Chen Gao, Jianjie Fang, Xin Zeng, Kaiyuan Li, Ziyou Wang, Jinqiang Cui, Xin Wang, Xinlei Chen, Yong Li'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D spatial reasoning
point clouds
Large Language Models
Input: Point clouds, visual and text inputs 3D点云、视觉和文本输入
Step1: Evaluating spatial reasoning能力评估空间推理
Step2: Developing a benchmark benchmark的开发
Step3: Analyzing model performance模型性能分析
Output: Insights into 3D LLMs对3D LLM的洞察
8.5 [8.5] 2504.04631 Systematic Literature Review on Vehicular Collaborative Perception -- A Computer Vision Perspective
[{'name': 'Lei Wan, Jianxin Zhao, Andreas Wiedholz, Manuel Bied, Mateus Martinez de Lucena, Abhishek Dinkar Jagtap, Andreas Festag, Ant\^onio Augusto Fr\"ohlich, Hannan Ejaz Keen, Alexey Vinel'}]
Autonomous Systems and Robotics 自动驾驶 v2
Collaborative Perception
Autonomous Vehicles
Computer Vision
Input: 106 peer-reviewed articles 106篇同行评审的文章
Step1: Literature selection 文献选择
Step2: Comparative analysis 比较分析
Step3: Identify research gaps 确定研究空白
Output: Systematic insights on CP 系统的CP洞察
8.5 [8.5] 2504.04701 DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation
[{'name': 'Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou'}]
3D Reconstruction and Modeling 三维重建 v2
RGBD segmentation
geometry prior
self-attention
Input: RGB and depth images RGB与深度图像
Step1: Feature extraction 特征提取
Step2: Geometry self-attention mechanism 几何自注意力机制
Step3: Model evaluation 模型评估
Output: Semantic segmentation results 语义分割结果
8.5 [8.5] 2504.04744 Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
[{'name': 'He Zhu, Quyu Kong, Kechun Xu, Xunlong Xia, Bing Deng, Jieping Ye, Rong Xiong, Yue Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D object affordance
vision-language model
robotics
Input: Language instructions, visual observations, and interactions 语言指令、视觉观测和交互
Step1: Dataset collection 数据集收集
Step2: Multi-modal feature fusion 多模态特征融合
Step3: Model implementation and evaluation 模型实施与评估
Output: Grounded 3D object affordance 具备位置的3D对象效用
8.5 [8.5] 2504.04781 OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance
[{'name': 'Chaoyi Wang, Baoqing Li, Xinhan Di'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D-aware supervision
occlusion recognition
multi-modal
large language models
Input: Multi-modal vision-language model and 3D expert reconstruction model 多模态视觉语言模型和3D专家重建模型
Step1: Pre-train the vision-language model 预训练视觉语言模型
Step2: Train the 3D expert reconstruction model 训练3D专家重建模型
Step3: Implement Chain-of-Thoughts learning 实施思维链学习
Output: Enhanced recognition of occluded objects 改进的遮挡物体识别
8.5 [8.5] 2504.04837 Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
[{'name': 'Zhi Zuo, Chenyi Zhuang, Zhiqiang Shen, Pan Gao, Jie Qin'}]
Point Cloud Processing 点云处理 v2
point cloud videos
self-supervised learning
4D representation
Input: Point cloud videos 点云视频
Step1: Model motion representation in latent space 在潜在空间中建模运动表示
Step2: Introduce latent and geometry tokens 引入潜在和几何标记
Step3: Train self-disentangled MAE 训练自解耦MAE
Output: Discriminative 4D representations 差别化的4D表示
8.5 [8.5] 2504.04841 Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation
[{'name': 'Sebastian Schmidt, Julius K\"orner, Dominik Fuchsgruber, Stefano Gasperini, Federico Tombari, Stephan G\"unnemann'}]
Autonomous Systems and Robotics 自动驾驶 v2
Panoptic segmentation 泛光分割
Anomaly detection 异常检测
Evidential learning 证据学习
Autonomous driving 自动驾驶
Input: Pixel-wise binary mask assignments 像素级二进制掩模分配
Step1: Incorporate Beta prior 引入Beta先验
Step2: Compute model uncertainty 计算模型不确定性
Step3: Perform anomaly and panoptic segmentation 执行异常和全景分割
Output: State-of-the-art segmentation results 最先进的分割结果
8.5 [8.5] 2504.05075 PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition
[{'name': 'Jie Wang, Tingfa Xu, Lihe Ding, Xinjie Zhang, Long Bai, Jianan Li'}]
Point Cloud Processing 点云处理 v2
point cloud recognition
4D representation learning
Input: Point cloud video sequences 点云视频序列
Step1: Motion capture through Motion Imitator 捕获运动通过运动模仿器
Step2: One-step query operation from Single-Step Motion Encoder 单步查询操作通过单步运动编码器
Output: Efficient point cloud video recognition 高效的点云视频识别
8.5 [8.5] 2504.05148 Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification
[{'name': 'Yasuhiro Yao, Ryoichi Ishikawa, Takeshi Oishi'}]
Depth Estimation 深度估计 v2
Depth Estimation 深度估计
Sensor Fusion 传感器融合
Autonomous Systems 自主系统
Input: Stereo camera images and LiDAR data 立体相机图像和LiDAR数据
Step1: Apply Semi-Global Matching (SGM) to estimate disparity 使用Semi-Global Matching (SGM)估计视差
Step2: Implement Discrete Disparity-matching Cost (DDC) for disparity evaluation 实现离散视差匹配成本 (DDC) 用于视差评估
Step3: Perform semidensification to enhance disparity maps 进行半密集化以增强视差图
Step4: Execute stereo-LiDAR consistency check for validation 执行立体-激光雷达一致性检查以进行验证
Output: Accurate depth maps with improved performance 输出:准确的深度图并提高性能
8.5 [8.5] 2504.05152 PanoDreamer: Consistent Text to 360-Degree Scene Generation
[{'name': 'Zhexiao Xiong, Zhang Chen, Zhong Li, Yi Xu, Nathan Jacobs'}]
3D Generation 三维生成 v2
3D generation
text to 3D
geometric consistency
Input: Text description and/or reference image 文本描述和/或参考图像
Step1: Generate initial panoramic scene 生成初始全景场景
Step2: Lift panorama into 3D 提升全景至三维
Step3: Generate images from different viewpoints 根据不同视点生成图像
Step4: Compose images into a global point cloud 将图像合成全局点云
Step5: Use 3D Gaussian Splatting for final scene rendering 使用3D高斯点云进行最终场景渲染
8.5 [8.5] 2504.05201 3D Universal Lesion Detection and Tagging in CT with Self-Training
[{'name': 'Jared Frazier, Tejas Sudharshan Mathai, Jianfei Liu, Angshuman Paul, Ronald M. Summers'}]
3D Reconstruction and Modeling 三维重建 v2
3D lesion detection
self-training
computed tomography
Input: CT images 计算机断层扫描图像
Step1: Train VFNet model for 2D detection 训练VFNet模型进行二维检测
Step2: Expand 2D detection to 3D 将二维检测扩展到三维
Step3: Self-training with 3D proposals 进行自我训练以使用3D预测
Output: Tagged 3D lesions 标记的三维病变
7.5 [7.5] 2504.04099 TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection
[{'name': 'Chunzhao Xie, Tongxuan Liu, Lei Jiang, Yuting Zeng, jinrong Guo, Yunheng Shen, Weizhe Huang, Jing Li, Xiaohua Xu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
hallucination mitigation
Temporal Attention
Input: Large Vision-Language Models (LVLMs)
Step 1: Investigate attention decay correlation with hallucinations
Step 2: Propose Temporal Attention Real-time Accumulative Connection (TARAC)
Step 3: Integrate TARAC into existing LVLM architectures
Output: Enhanced attention mechanisms mitigating hallucinations
7.5 [7.5] 2504.04676 Dual Consistent Constraint via Disentangled Consistency and Complementarity for Multi-view Clustering
[{'name': 'Bo Li, Jing Yun'}]
Multi-view and Stereo Vision 多视角与立体视觉 v2
Multi-view clustering 多视角聚类
Consistency 一致性
Complementarity 互补性
Input: Multi-view data 多视角数据
Step1: Separate shared and private information 分离共享和私有信息
Step2: Learn consistencies通过对比学习最大化互信息
Step3: Apply dual consistency constraints 使用双一致性约束
Output: Improved clustering performance 改进的聚类性能
7.5 [7.5] 2504.04911 IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR
[{'name': 'Ziyun Liang, Xiaoqing Guo, Wentian Xu, Yasin Ibrahim, Natalie Voets, Pieter M Pretorius, J. Alison Noble, Konstantinos Kamnitsas'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
anomaly detection
MRI segmentation
Input: 3D Brain MRI scans 3D 脑部 MRI 扫描
Step1: Spatial masking of images 空间掩蔽图像
Step2: Iterative mask refinement 迭代掩蔽精化
Step3: Anomaly reconstruction 异常重建
Output: Segmented anomalies 细分异常
7.0 [7.0] 2504.04740 Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
[{'name': 'Samarth Mishra, Kate Saenko, Venkatesh Saligrama'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
compositional reasoning
vision-language models
multimodal learning
Input: Multimodal large language models (MLLMs) 多模态大型语言模型
Step1: Data augmentation data generation 数据增强数据生成
Step2: Preference tuning on synthetic data 通过合成数据进行偏好调整
Step3: Model evaluation on compositional benchmarks 模型在合成基准上的评估
Output: Improved compositional reasoning capabilities 改进的组合推理能力

Arxiv 2025-04-07

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.03052 Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks
[{'name': 'Hyun-Ho Choi, Kangsoo Kim, Ki-Ho Lee, Kisong Lee'}]
3D Reconstruction and Modeling 三维重建 v2
3D pose estimation
cooperative inference
mobile edge computing
real-time processing
Input: Images captured by multiple cameras 多摄像头捕获的图像
Step1: 2D pose estimation from images 从图像中估计二维姿态
Step2: Offloading filtered images to edge server 将筛选后的图像转发到边缘服务器
Step3: 3D joint coordinate calculation on edge server 在边缘服务器上计算三维关节坐标
Output: Real-time 3D pose estimation 实时三维姿态估计
9.5 [9.5] 2504.03059 Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization
[{'name': 'Haishan Wang, Mohammad Hassan Vali, Arno Solin'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
memory compression
3D reconstruction
Input: 3D Gaussian Splatting models 3D高斯点云模型
Step1: Build attribute codebooks 构建属性码本
Step2: Apply noise-substituted vector quantization 应用噪声替代的向量量化
Step3: Optimize memory usage 优化内存使用
Output: Compressed 3D representations 压缩的三维表示
9.5 [9.5] 2504.03164 NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving
[{'name': 'Kexin Tian, Jingrui Mao, Yunlong Zhang, Jiwan Jiang, Yang Zhou, Zhengzhong Tu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
autonomous driving
spatial reasoning
3D scene graph
Input: NuScenes dataset with multi-modal sensor data 启用: NuScenes 数据集与多模态传感器数据
Step1: 3D scene graph generation pipeline 3D场景图生成管道
Step2: QA generation pipeline 问答生成管道
Step3: Evaluation of VLMs on spatial understanding and reasoning VLM在空间理解和推理上的评估
Output: Benchmark for VLMs in autonomous driving autonomous driving中的VLM基准
9.5 [9.5] 2504.03177 Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image
[{'name': 'Yuki Kawana, Tatsuya Harada'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
articulated objects
RGBD images
Input: Single RGBD image 单个RGBD图像
Step1: Part detection 部件检测
Step2: Kinematics-aware part fusion 运动学感知部件融合
Step3: Anisotropic scale normalization 各向异性尺度归一化
Step4: Cross-refinement for output space cross-refinement 在输出空间进行交叉细化
Output: Reconstructed articulated shapes 重建的关节形状
9.5 [9.5] 2504.03198 Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video
[{'name': 'Jiaxin Guo, Wenzhen Dong, Tianyu Huang, Hao Ding, Ziyi Wang, Haomin Kuang, Qi Dou, Yun-Hui Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
monocular video
surgical robotics
Input: Monocular surgical videos 单目外科视频
Step1: Data integration 数据集成
Step2: Algorithm implementation 算法实现
Step3: Uncertainty measurement 不确定性测量
Step4: Pointmap and depth prediction 点图和深度预测
Output: 3D models and camera parameters 3D模型和相机参数
9.5 [9.5] 2504.03258 TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking
[{'name': 'Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, Bin Yang'}]
3D Multi-Object Tracking 3D多目标跟踪 v2
3D tracking
query denoising
autonomous driving
Input: Ground truth detections from previous frame
Step1: Generate denoising queries with noise
Step2: Propagate denoising queries to current frame
Step3: Predict corresponding ground truths
Output: Enhanced tracking results
9.5 [9.5] 2504.03438 ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving
[{'name': 'Sheng Yang, Tong Zhan, Shichen Qiao, Jicheng Gong, Qing Yang, Yanfeng Lu, Jian Wang'}]
3D Object Detection 3D物体检测 v2
3D object perception
autonomous driving
4D radar
Input: 4D radar and camera data 4D 雷达和相机数据
Step1: Fusion of sensor data 传感器数据融合
Step2: Feature extraction 特征提取
Step3: 3D object detection algorithm 3D物体检测算法
Output: Improved object perception 精确的物体感知
9.5 [9.5] 2504.03563 PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector
[{'name': 'Kaidong Li, Tianxiao Zhang, Kuan-Chuan Peng, Guanghui Wang'}]
3D Object Detection and LiDAR融合 3D对象检测 v2
3D detection 3D检测
LiDAR
autonomous driving 自动驾驶
Input: Camera images and LiDAR point clouds 摄像机图像和激光雷达点云
Step1: Data preprocessing 数据预处理
Step2: Feature extraction 特征提取 using foundation model encoders
Step3: Soft prompt integration 软提示集成 for feature fusion
Step4: 3D detection model training 3D检测模型训练
Output: Enhanced 3D object detection results 改进的3D物体检测结果
9.5 [9.5] 2504.03602 Robust Human Registration with Body Part Segmentation on Noisy Point Clouds
[{'name': 'Kai Lascheit, Daniel Barath, Marc Pollefeys, Leonidas Guibas, Francis Engelmann'}]
3D Reconstruction and Modeling 三维重建 v2
3D human meshes
body-part segmentation
pose estimation
noisy point clouds
mesh fitting
Input: Noisy point clouds 噪声点云
Step1: Body part segmentation body-part segmentation
Step2: SMPL-X fitting SMPL-X拟合
Step3: Pose and orientation initialization 姿态和方向初始化
Step4: Refinement of point cloud alignment 点云对齐细化
Output: Accurate human mesh 人体网格
9.0 [9.0] 2504.03536 HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration
[{'name': 'Boyuan Wang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Guan Huang, Lihong Liu, Xingang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
human avatars
autonomous driving
Input: Single human image 单幅人像图像
Step1: 3D Gaussian Splatting for initial geometry 初步几何构建
Step2: Multi-view generation through integration 多视角图像生成
Step3: HumanFixer for restoration and refinement 修复与改进流程
Output: High-quality, animatable human avatars 输出:高质量可动画人形模型
8.5 [8.5] 2504.02884 Enhancing Traffic Sign Recognition On The Performance Based On Yolov8
[{'name': 'Baba Ibrahim (Hubei University of Automotive Technology,Hubei University of Automotive Technology), Zhou Kui (Hubei University of Automotive Technology,Hubei University of Automotive Technology)'}]
Autonomous Driving 自动驾驶 v2
Traffic Sign Recognition
Yolov8
Autonomous Driving
Input: Traffic sign images 交通标志图像
Step1: Data augmentation 数据增强
Step2: Model training using YOLOv8 使用YOLOv8模型训练
Step3: Model evaluation on various datasets 在不同数据集上评估模型
Output: Enhanced detection models 改进的检测模型
8.5 [8.5] 2504.02920 LiDAR-based Object Detection with Real-time Voice Specifications
[{'name': 'Anurag Kulkarni'}]
Autonomous Systems and Robotics 自动驾驶 v2
LiDAR
object detection
autonomous driving
real-time voice synthesis
Input: LiDAR and RGB data LiDAR和RGB数据
Step1: Data integration 数据集成
Step2: Object detection algorithm development 物体检测算法开发
Step3: Real-time voice synthesis implementation 实时语音合成实现
Output: Real-time feedback and 3D visualizations 实时反馈和3D可视化
8.5 [8.5] 2504.03047 Attention-Aware Multi-View Pedestrian Tracking
[{'name': 'Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut'}]
Multi-view Stereo 多视角立体 v2
multi-view tracking
attention mechanisms
pedestrian detection
Input: Multi-view images 多视角图像
Step 1: Early-fusion for detection 早期融合进行检测
Step 2: Cross-attention mechanism for association 使用交叉注意机制进行关联
Step 3: Robust feature propagation 可靠特征传播
Output: Enhanced pedestrian tracking performance 改进的人行道跟踪性能
8.5 [8.5] 2504.03089 SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections
[{'name': 'Prashant Kumar, Dheeraj Vattikonda, Kshitij Madhav Bhat, Kunal Dargan, Prem Kalra'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
LiDAR-based SLAM
autonomous driving
adversarial attacks
point injections
Input: LiDAR scans from autonomous vehicles
Step1: Develop a novel autoencoder with segmentation-based attention
Step2: Integrate contrastive learning for precise LiDAR reconstructions
Step3: Implement point injections to test adversarial attacks
Output: Efficacy of point injections on SLAM navigation
8.5 [8.5] 2504.03171 Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion
[{'name': 'Zeyang Zheng, Arman Hosseini, Dong Chen, Omid Shoghli, Arsalan Heydarian'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人技术 v2
obstacle detection
e-scooter
deep learning
sensor fusion
Input: RGB camera and depth camera RGB相机和深度相机
Step1: Sensor integration 传感器集成
Step2: Obstacle detection using YOLO 使用YOLO进行障碍物检测
Step3: Depth data analysis 深度数据分析
Output: Real-time obstacle detection results 实时障碍物检测结果
8.5 [8.5] 2504.03193 Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
[{'name': 'Xin Zhang, Robby T. Tan'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Domain Generalized Semantic Segmentation
Vision Foundation Models
Vision-Language Models
autonomous driving
computational efficiency
Input: Domain data and models 领域数据与模型
Step1: Feature extraction 特征提取
Step2: Model adaptation 模型适应
Step3: Domain generalization evaluation 域泛化评估
Output: Enhanced segmentation performance 改进的分割性能
8.5 [8.5] 2504.03306 Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection
[{'name': 'Mathis Kruse, Bodo Rosenhahn'}]
Multi-view Stereo 多视角立体 v2
Multi-view anomaly detection 多视角异常检测
Normalizing flows 正规化流
Industrial applications 工业应用
Input: Multi-view images 多视角图像
Step1: Data fusion 融合数据
Step2: Cross-view message passing 跨视图信息传递
Step3: Anomaly detection 进行异常检测
Output: Detected anomalies 检测到的异常
8.5 [8.5] 2504.03468 D-Garment: Physics-Conditioned Latent Diffusion for Dynamic Garment Deformations
[{'name': 'Antoine Dumoulin, Adnane Boukhayma, Laurence Boissieux, Bharath Bhushan Damodaran, Pierre Hellier, Stefanie Wuhrer'}]
3D Generation 三维生成 v2
3D Garment Deformation
Latent Diffusion Model
Dynamic Modeling
Vision Sensors
Input: 3D garment template 3D服装模板
Step1: Condition on body shape and motion 以身体形状和运动为条件
Step2: Use latent diffusion model 使用潜在扩散模型
Step3: Optimize to fit observations 最优化以适应观测
Output: Dynamically deformed garment output 动态变形服装输出
8.5 [8.5] 2504.03637 An Algebraic Geometry Approach to Viewing Graph Solvability
[{'name': "Federica Arrigoni, Kathl\'en Kohn, Andrea Fusiello, Tomas Pajdla"}]
Multi-view Geometry 多视图几何 v2
Viewing Graph
Structure-from-Motion
Algebraic Geometry
Input: Viewing graph associated with cameras 视图图与相机关联
Step1: Develop novel algebraic framework for solvability problems 提出新的代数框架用于求解问题
Step2: Analyze conditions for camera determinability 分析相机可确定性的条件
Step3: Implement computational methods for graph partitioning and solvability testing 实现图划分和求解测试的计算方法
Output: Improved understanding of structure-from-motion graphs and their solvability 改进对运动结构图及其可解性的理解
8.0 [8.0] 2504.03249 Robot Localization Using a Learned Keypoint Detector and Descriptor with a Floor Camera and a Feature Rich Industrial Floor
[{'name': 'Piet Br\"ommel, Dominik Br\"amer, Oliver Urbann, Diana Kleingarn'}]
Autonomous Systems and Robotics 自动驾驶 v2
robot localization
feature extraction
Input: Images of industrial floor 工业地面的图像
Step1: Keypoint extraction 关键点提取
Step2: Deep learning for features 深度学习获取特征
Step3: Position estimation 位置估计
Output: Accurate robot localization 准确的机器人定位
7.5 [7.5] 2504.02876 Multimodal Reference Visual Grounding
[{'name': 'Yangxiao Lu, Ruosen Li, Liqiang Jing, Jikai Wang, Xinya Du, Yunhui Guo, Nicholas Ruozzi, Yu Xiang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
multimodal reference visual grounding
large vision-language models
few-shot object detection
Input: Query image and reference images 输入: 查询图像和参考图像
Step1: Dataset creation for MRVG 创建MRVG数据集
Step2: Novel method for visual grounding using LLMs 开发基于LLMs的视觉定位新方法
Step3: Evaluation of the model's visual grounding performance 模型可视化定位性能的评估
Output: Bounding boxes or segmentation masks 输出: 目标对象的边界框或分割掩码
7.5 [7.5] 2504.03140 Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
[{'name': 'Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
diffusion models
caching strategy
Input: Video sequences 视频序列
Step1: Analyze attention distributions 分析注意力分布
Step2: Develop adaptive caching strategy 开发自适应缓存策略
Step3: Validate through experiments 实验验证
Output: Efficient video generation 高效视频生成
7.5 [7.5] 2504.03154 TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference
[{'name': 'Junshan Hu, Jialiang Mao, Zhikang Liu, Zhongpu Xia, Peng Jia, Xianpeng Lang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Token adaptation
Input: Images as input images 作为输入图像
Step1: Stochastic training of vision tokens 随机训练视觉令牌
Step2: Dynamic adjustment of token counts 动态调整令牌数量
Step3: Experiments on vision-language benchmarks 在视觉-语言基准上的实验
Output: Performance evaluation and comparison with fixed-token models 输出:与固定令牌模型的性能评估和比较
7.5 [7.5] 2504.03440 Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models
[{'name': 'Mirko Borszukovszki, Ivo Pascal de Jong, Matias Valdenegro-Toro'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Visual Language Models
Uncertainty Estimation
Corrupted Images
Large Language Models
Input: Corrupted image data 受损图像数据
Step1: Model testing 模型测试
Step2: Uncertainty estimation 不确定性估计
Step3: Results analysis 结果分析
Output: Confidence scores 置信度分数
6.0 [6.0] 2504.03490 BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution
[{'name': 'Zihao He, Shengchuan Zhang, Runze Hu, Yunhang Shen, Yan Zhang'}]
Image Generation 图像生成 v2
super-resolution
diffusion models
Input: Low-resolution images (LR) 低分辨率图像
Step1: Bayesian model generates uncertainty masks 贝叶斯模型生成不确定性掩码
Step2: Modulation of noise during diffusion process 在扩散过程中对噪声进行调制
Step3: Training with enhanced focus on high-uncertainty areas 在高不确定性区域进行增强关注的训练
Output: Super-resolved images 高分辨率图像
5.0 [5.0] 2504.03254 SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding
[{'name': 'Yimin Wei, Aoran Xiao, Yexian Ren, Yuting Zhu, Hongruixuan Chen, Junshi Xia, Naoto Yokoya'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Synthetic Aperture Radar (SAR)
Vision-Language Models (VLMs)
Image Captioning
Visual Question Answering (VQA)
Input: SAR images and corresponding text annotations SAR 图像与对应文本注释
Step1: Dataset creation 数据集创建
Step2: Model training and evaluation 模型训练与评估
Output: Enhanced understanding of SAR images 改进的SAR图像理解

Arxiv 2025-04-04

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.02261 WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
[{'name': 'Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei'}]
3D Generation 三维生成 v2
3D generation
real-time rendering
interactive 3D
Input: User-provided single image 用户提供的单张图像
Step1: Implement StepSplat for geometric updates 实现StepSplat进行几何更新
Step2: Use QuickDepth for depth consistency 使用QuickDepth确保深度一致性
Step3: Apply FastPaint for appearance inpainting 应用FastPaint进行外观修复
Output: Interactive 3D scenes with high-quality output 输出: 高质量的交互式3D场景
9.5 [9.5] 2504.02270 MinkOcc: Towards real-time label-efficient semantic occupancy prediction
[{'name': 'Samuel Sze, Daniele De Martini, Lars Kunze'}]
3D Reconstruction and Modeling 三维重建 v2
3D semantic occupancy prediction
autonomous driving
Input: Multi-view images and LiDAR data 多视角图像和激光雷达数据
Step1: Warm-start with small dataset of 3D annotations 用小型3D注释数据集进行热启动
Step2: Continued training with LiDAR sweeps and images 使用激光雷达扫描和图像进行后续训练
Step3: Real-time inference through sparse convolution networks 通过稀疏卷积网络实现实时推断
Output: 3D semantic occupancy prediction 3D语义占用预测
9.5 [9.5] 2504.02316 ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation
[{'name': 'Yuan Zhou, Shilong Jin, Litao Hua, Wanjun Lv, Haoran Duan, Jungong Han'}]
3D Generation 三维生成 text-to-3D generation
multi-view consistency
view biases
visual quality
geometry consistency
Input: Text descriptions 文本描述
Step1: View Disentanglement Module (VDM) 视图解耦模块
Step2: Similarity-based partial order loss 相似性基础的部分顺序损失
Output: Geometrically consistent 3D generation 几何一致的3D生成
9.5 [9.5] 2504.02337 LPA3D: 3D Room-Level Scene Generation from In-the-Wild Images
[{'name': 'Ming-Jia Yang, Yu-Xiao Guo, Yang Liu, Bin Zhou, Xin Tong'}]
3D Reconstruction and Modeling 三维重建 v2
3D room-level scene generation
NeRF
GAN
Input: In-the-wild images 从野外图像输入
Step1: Define local-pose-alignment (LPA) framework 定义局部姿态对齐框架
Step2: Implement LPA-GAN for scene generation 实现LPA-GAN进行场景生成
Step3: Co-optimize pose predictor and scene generation co-optimizing姿态预测器和场景生成
Output: Generated 3D indoor scenes 生成的3D室内场景
9.5 [9.5] 2504.02356 All-day Depth Completion via Thermal-LiDAR Fusion
[{'name': 'Janghyun Kim, Minseong Kweon, Jinsun Park, Ukcheol Shin'}]
Depth Estimation 深度估计 v2
Depth Completion 深度补全
Thermal-LiDAR Fusion 热激光雷达融合
Autonomous Driving 自动驾驶
Input: Sparse LiDAR and RGB images 垂直激光雷达和RGB图像
Step1: Benchmark existing algorithms 基准现有算法
Step2: Propose COntrastive learning and Pseudo-Supervision framework 提出C对比学习和伪监督框架
Step3: Enhance depth boundary clarity 改进深度边界的清晰度
Output: Enhanced depth completion performance 改进的深度完成性能
9.5 [9.5] 2504.02437 MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM
[{'name': 'Renwu Li, Wenjing Ke, Dong Li, Lu Tian, Emad Barsoum'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
3D Gaussian mapping
Simultaneous Localization and Mapping (SLAM)
RGB inputs
Visual odometry
Input: RGB images 仅输入RGB图像
Step1: Dynamic 3D Gaussian insertion 动态三维高斯插入
Step2: Gaussian densification module 高斯密集模块
Step3: Online visual odometry 视觉里程计
Output: Accurate 3D mapping 准确的三维映射
9.5 [9.5] 2504.02464 CornerPoint3D: Look at the Nearest Corner Instead of the Center
[{'name': 'Ruixiao Zhang, Runwei Guan, Xiangyu Chen, Adam Prugel-Bennett, Xiaohao Cai'}]
3D Reconstruction and Modeling 三维重建 v2
3D object detection
LiDAR point clouds
autonomous driving
Input: LiDAR point clouds from 3D sensors
Step1: Analyze object surfaces and centers
Step2: Develop EdgeHead for surface detection
Step3: Implement CornerPoint3D for corner prediction
Output: Enhanced 3D object detection performance
9.5 [9.5] 2504.02762 MD-ProjTex: Texturing 3D Shapes with Multi-Diffusion Projection
[{'name': 'Ahmet Burak Yildirim, Mustafa Utku Aydogdu, Duygu Ceylan, Aysegul Dundar'}]
Image and Video Generation 图像生成 v2
3D shapes
text-guided texture generation
multi-view consistency
Input: Pretrained text-to-image diffusion models 预训练的文本到图像扩散模型
Step1: Implement multi-diffusion consistency mechanism 实现多扩散一致性机制
Step2: Fuse noise predictions from multiple views 融合来自多个视角的噪声预测
Step3: Generate coherent textures for 3D shapes 生成一致的3D形状纹理
Output: Fast and consistent textured 3D models 速度快且一致的纹理3D模型
9.5 [9.5] 2504.02763 CanonNet: Canonical Ordering and Curvature Learning for Point Cloud Analysis
[{'name': 'Benjy Friedmann, Michael Werman'}]
Point Cloud Processing 点云处理 v2
point cloud processing
geometry
curvature estimation
neural networks
Input: Raw point clouds 原始点云
Step1: Preprocessing pipeline for canonical point ordering 预处理管道用于规范点排序
Step2: Geometric learning framework for curvature estimation 几何学习框架用于曲率估计
Output: Enhanced point cloud features 改进的点云特征
9.5 [9.5] 2504.02764 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
[{'name': 'Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D scene generation
video diffusion model
momentum
Input: Single image 单幅图像
Step1: Construct noisy samples from original features 原始特征构建噪声样本
Step2: Introduce pixel-level momentum to generate video 引入像素级动量生成视频
Step3: Iteratively recover a 3D scene 迭代恢复3D场景
Output: High-fidelity 3D scene 高保真3D场景
9.5 [9.5] 2504.02817 Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
[{'name': 'Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu, Xiaoxia Sun, Chong Shang, Kiran Bhat, Deva Ramanan, Jun-Yan Zhu, Maneesh Agrawala, Tinghui Zhou'}]
3D Generation 三维生成 v2
3D generation 3D生成
autoregressive models 自回归模型
adaptive tokenization 自适应标记化
Input: 3D shapes 3D形状
Step1: Adaptive tokenization 动态标记化
Step2: Octree construction 八叉树构建
Step3: Autoregressive shape generation 自回归形状生成
Output: High-quality 3D content 高质量3D内容
9.0 [9.0] 2504.02480 Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging
[{'name': 'Kyungmin Choi, JaKeoung Koo, Stephen McLaughlin, Abderrahim Halimi'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
single-photon Lidar
Bayesian modeling
dual-peak imaging
Input: Single-photon Lidar data 单光子激光雷达数据
Step1: Histogram data processing 直方图数据处理
Step2: Dual peak feature extraction 双峰特征提取
Step3: Bayesian modeling and neural network unrolling 贝叶斯建模与神经网络展开
Output: 3D reconstruction results 3D重建结果
8.5 [8.5] 2504.02158 UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting
[{'name': 'Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon'}]
3D Reconstruction and Modeling 三维重建 v2
digital twins 数字孪生
UAV 无人机
3D Gaussian Splatting 3D高斯点云
Input: UAV images UAV 图像
Step1: Foreground component synthesis 前景组件合成
Step2: Gaussian splatting integration 结合高斯点云
Step3: Data augmentation 数据增强
Output: Digital twin generation 数字孪生生成
8.5 [8.5] 2504.02264 MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
[{'name': 'Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang, Tiao Tan, Huaping Liu'}]
Autonomous Driving 自动驾驶 v2
multimodal learning
driver assistance systems
multi-task learning
Input: Multimodal data (driving context, driver behavior) 驱动上下文、多模态数据(驾驶上下文,驾驶员行为)
Step 1: Multi-axis region attention to extract features 从多轴区域关注提取特征
Step 2: Dual-branch multimodal embedding to adjust parameters 双支路多模态嵌入调整参数
Step 3: Evaluate on AIDE dataset 在AIDE数据集上评估
Output: Improved recognition performance 提升的识别性能
8.5 [8.5] 2504.02454 Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
[{'name': 'Changshuo Wang, Shuting He, Xiang Fang, Meiqing Wu, Siew-Kei Lam, Prayag Tiwari'}]
Point Cloud Processing 点云处理 v2
few-shot learning
point cloud segmentation
3D reconstruction
Input: Point clouds and limited labeled data 点云和有限标注数据
Step1: Polynomial fitting for local structure representation 局部结构表示的多项式拟合
Step2: Development of TaylorConv for local structure fitting 开发TaylorConv以进行局部结构拟合
Step3: Constructing variants of TaylorSeg (TaylorSeg-NN, TaylorSeg-PN) 构建TaylorSeg的变体(TaylorSeg-NN,TaylorSeg-PN)
Output: Enhanced segmentation of unseen categories 改进的未见类别分割
8.5 [8.5] 2504.02517 MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields
[{'name': 'Yash Kulthe, Andrew Gilbert, John Collomosse'}]
Neural Rendering 神经渲染 v2
3D watermarking
Neural Radiance Fields
intellectual property
3D content
Input: NeRF model with watermarking grid 采用带水印网格的NeRF模型
Step1: Extend TensoRF with watermark grid 扩展TensoRF以包含水印网格
Step2: Implement FiLM-based conditional modulation 实现基于FiLM的条件调制
Step3: Train the model with watermark embedding 训练模型以嵌入水印
Output: NeRF model with multiple watermarks 输出:带有多个水印的NeRF模型
8.5 [8.5] 2504.02617 PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation
[{'name': 'Lihua Liu, Jiehong Lin, Zhenxin Liu, Kui Jia'}]
3D Reconstruction and Modeling 三维重建 v2
pose estimation
3D models
correspondence learning
Input: RGB images and CAD models RGB图像和CAD模型
Step1: Feature matching for coarse correspondences 特征匹配以获得粗略对应
Step2: Global transformation estimation for smooth correspondences 全局变换估计以平滑对应
Step3: Local refinement for fine correspondences 局部细化以优化对应
Output: 6D object pose estimation 6D物体姿态估计
8.5 [8.5] 2504.02782 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
[{'name': 'Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li Yuan'}]
Image Generation 图像生成 v2
image generation
benchmark
GPT-4o
Input: GPT-4o model outputs
Step1: Benchmark creation for evaluation
Step2: Qualitative and quantitative analysis of generated images
Step3: Comparative study with other models
Output: Insights on generative performance and limitations
8.5 [8.5] 2504.02812 BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation
[{'name': 'Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan'}]
6D Object Pose Estimation 6D物体位姿估计 v2
6D pose estimation
object detection
model-based
model-free
Input: 6D object pose estimation task 6D物体位姿估计任务
Step1: Develop evaluation methodology 开发评估方法
Step2: Introduce new datasets 引入新数据集
Step3: Implement model-based and model-free approaches 实现基于模型和无模型的方法
Output: Results of the BOP Challenge 2024 2024 BOP挑战的结果
7.5 [7.5] 2504.02799 Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence
[{'name': 'Anita Rau, Mark Endo, Josiah Aklilu, Jaewoo Heo, Khaled Saab, Alberto Paderno, Jeffrey Jopling, F. Christopher Holsinger, Serena Yeung-Levy'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
surgical AI
Input: Large Vision-Language Models 视觉语言模型
Step1: Comprehensive analysis of VLMs 对VLM的综合分析
Step2: Performance evaluation on surgical tasks 对外科任务的性能评估
Step3: Insights on adaptability 适应性洞察
Output: Insights for surgical AI 外科人工智能的洞察
7.5 [7.5] 2504.02821 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
[{'name': 'Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Sparse Autoencoders
Vision-Language Models
Interpretability
Input: Sparse Autoencoders (SAEs) 稀疏自编码器
Step1: Framework introduction 框架介绍
Step2: Monosemanticity evaluation 单义性评估
Step3: Application to VLMs 应用到视觉语言模型
Output: Enhanced interpretability of VLMs 改进的视觉语言模型可解释性

Arxiv 2025-04-03

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2504.01023 Omnidirectional Depth-Aided Occupancy Prediction based on Cylindrical Voxel for Autonomous Driving
[{'name': 'Chaofan Wu, Jiaheng Li, Jinghao Cao, Ming Li, Yongkang Feng, Jiayu Wu Shuwen Xu, Zihang Gao, Sidan Du, Yang Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D perception
occupancy prediction
autonomous driving
cylindrical voxel
Input: Omnidirectional depth data 全向深度数据
Step1: Build cylindrical voxel representation 构建圆柱体体素表示
Step2: Implement Sketch-Coloring framework 实现素描上色框架
Step3: Evaluate occupancy prediction performance 评估占用预测性能
Output: Enhanced 3D occupancy prediction 改进的3D占用预测
9.5 [9.5] 2504.01503 Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
[{'name': 'Ziteng Cui, Xuangeng Chu, Tatsuya Harada'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting 3D高斯点云
novel view synthesis 新视图合成
lighting adaptation 光照适应
Input: Multi-view images 多视角图像
Step1: Image processing with per-view color matrix mapping 使用每视图的颜色矩阵映射进行图像处理
Step2: Curve adjustment to adapt to lighting conditions 曲线调整以适应光照条件
Step3: Joint optimization with 3DGS parameters 与3DGS参数共同优化
Output: Enhanced novel views 改进的新视图
9.5 [9.5] 2504.01512 High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
[{'name': 'Yiyang Shen, Kun Zhou, He Wang, Yin Yang, Tianjia Shao'}]
3D Generation 三维生成 v2
3D generation 三维生成
Gaussian splatting 高斯点云
Input: Single-view images 单视图图像
Step1: Feature extraction 特征提取
Step2: Gaussian generation 高斯生成
Step3: 3D reconstruction 3D重建
Output: High-fidelity 3D objects 高保真3D物体
9.5 [9.5] 2504.01559 RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avatars
[{'name': 'Yahui Li, Zhi Zeng, Liming Pang, Guixuan Zhang, Shuwu Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
Dynamic Clothing Modeling
Animatable Avatars
Input: Multi-view videos 多视角视频
Step1: Motion trend modeling 动态趋势建模
Step2: Skeletal feature encoding 骨骼特征编码
Step3: Clothing deformation capture 服装变形捕捉
Output: High-fidelity animatable avatars 高保真动画化虚拟人像
9.5 [9.5] 2504.01619 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting
[{'name': 'Hao Wu, Hao Wang, Ruochong Li, Xuran Ma, Hui Xiong'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
bonsai generation
Gaussian splatting
Input: Text descriptions and conditions 输入: 文本描述和条件
Step1: Design trainable 3D space colonization algorithm 第一步: 设计可训练的三维空间殖民算法
Step2: Generate bonsai structures using structure-aware 3D Gaussian splatting 第二步: 使用结构感知的三维高斯点云生成盆栽结构
Step3: Evaluate model with 2D-3D consistency checks 第三步: 使用2D-3D一致性检查评估模型
Output: Complex 3D bonsai models 输出: 复杂的三维盆栽模型
9.5 [9.5] 2504.01641 Bridge 2D-3D: Uncertainty-aware Hierarchical Registration Network with Domain Alignment
[{'name': 'Zhixin Cheng, Jiacheng Deng, Xinjun Li, Baoqun Yin, Tianzhu Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
image registration
point cloud
Input: Image and point cloud data 图像和点云数据
Step1: Image-to-point cloud registration 基于图像至点云的配准
Step2: Uncertainty-aware matching 关注不确定性匹配
Step3: Domain alignment 域对齐
Output: Accurate transformations for 3D reconstruction 适用于三维重建的准确变换
9.5 [9.5] 2504.01647 FlowR: Flowing from Sparse to Dense 3D Reconstructions
[{'name': 'Tobias Fischer, Samuel Rota Bul\`o, Yung-Hsu Yang, Nikhil Varma Keetha, Lorenzo Porzi, Norman M\"uller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder'}]
3D Reconstruction 三维重建 v2
3D reconstruction
novel view synthesis
multi-view
flow matching
Gaussian splatting
Input: A set of 2D images of a 3D scene 场景的二维图像集
Step1: Data collection and preprocessing 数据收集与预处理
Step2: 3D reconstruction using 3D Gaussian splatting 采用3D高斯喷溅进行三维重建
Step3: Flow matching to connect sparse and dense renderings 使用流匹配连接稀疏和密集渲染
Output: Improved novel view synthesis and 3D reconstruction 改进的视图合成和三维重建
9.5 [9.5] 2504.01732 FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking
[{'name': 'Ulas Gunes, Matias Turkulainen, Xuqian Ren, Arno Solin, Juho Kannala, Esa Rahtu'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene reconstruction
fisheye image dataset
Input: Fisheye images 鱼眼图像
Step1: Dataset collection 数据集收集
Step2: Point cloud generation 点云生成
Step3: Model evaluation 模型评估
Output: Benchmarking results 基准测试结果
9.5 [9.5] 2504.01844 BOGausS: Better Optimized Gaussian Splatting
[{'name': "St\'ephane Pateux, Matthieu Gendrin, Luce Morin, Th\'eo Ladune, Xiaoran Jiang"}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
novel view synthesis
optimization
high-fidelity rendering
Input: 3D Gaussian Splatting data 3D高斯点云数据
Step1: Analyze training process 分析训练过程
Step2: Propose optimization methodology 提出优化方法
Step3: Model evaluation and comparison 模型评估与比较
Output: Optimized Gaussian models 优化的高斯模型
9.5 [9.5] 2504.01872 CoMatcher: Multi-View Collaborative Feature Matching
[{'name': 'Jintao Zhang, Zimin Xia, Mingyue Dong, Shuhan Shen, Linwei Yue, Xianwei Zheng'}]
Multi-view Stereo 多视角立体 v2
3D reconstruction
multi-view matching
deep learning
feature matching
Input: Image set of a scene 场景的图像集
Step1: Group images based on co-visibility 根据可见性分组图像
Step2: Collaborative matching using CoMatcher 使用CoMatcher进行协同匹配
Step3: Establish correspondence for 3D reconstruction 建立对应关系以进行3D重建
Output: Reliable multi-view matches 可靠的多视角匹配
9.5 [9.5] 2504.01901 Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
[{'name': 'Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
visual instruction tuning
Input: Multi-view images 多视角图像
Step1: Cross-view reconstruction 交叉视图重建
Step2: Global-view reconstruction 全局视图重建
Step3: 3D representation learning 3D 表示学习
Output: Enhanced understanding of 3D scenes 改进的三维场景理解
9.5 [9.5] 2504.01956 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
[{'name': 'Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene generation
video diffusion models
sparse views
Input: Sparse views and corresponding camera poses 输入: 稀疏视图和对应的相机姿态
Step1: Coarse scene generation using a sparse-view 3DGS model 第一步: 使用稀疏视图3DGS模型生成粗略场景
Step2: Rapid distillation through a leap flow strategy 第二步: 通过跃流策略快速蒸馏
Step3: Denoising with a dynamic policy network 第三步: 使用动态策略网络去噪
Output: 3D scenes generated from video input 输出: 从视频输入生成的3D场景
9.5 [9.5] 2504.01957 GaussianLSS -- Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
[{'name': 'Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D perception
Depth estimation
Autonomous driving
Input: Multi-view images 多视角图像
Step1: Implement uncertainty modeling 实现不确定性建模
Step2: Transform depth distribution into 3D Gaussians 将深度分布转化为3D高斯分布
Step3: Rasterize for BEV feature construction 为BEV特征构建进行光栅化
Output: Uncertainty-aware BEV features 不确定性感知的BEV特征
9.5 [9.5] 2504.01960 Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis
[{'name': 'Niluthpol Chowdhury Mithun, Tuan Pham, Qiao Wang, Ben Southall, Kshitij Minhas, Bogdan Matei, Stephan Mandt, Supun Samarasekera, Rakesh Kumar'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
view synthesis
Gaussian Splatting
multi-view
diffusion models
Input: Multi-view images 多视角图像
Step1: Generate pseudo-observations using a diffusion model 通过扩散模型生成伪观察
Step2: Apply 3D Gaussian Splatting for optimization 使用三维高斯点云进行优化
Step3: Integrate appearance embeddings and depth priors 集成外观嵌入和深度先验
Output: Enhanced 3D reconstruction and novel views 输出:改进的三维重建和新视图
9.2 [9.2] 2504.01476 Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction
[{'name': 'Junlong Ren, Hao Wang'}]
Cross-modal 3D Retrieval 跨模态3D检索 v2
3D retrieval
multi-view images
point clouds
text modalities
Input: Multi-view images and point clouds 多视角图像和点云
Step1: Joint representation of 3D shapes 3D形状的联合表示
Step2: Tri-modal reconstruction 三模态重建
Step3: Fine-grained 2D-3D fusion 细粒度2D-3D融合
Output: Multimodal embeddings with enhanced alignment 输出:增强对齐的多模态嵌入
9.0 [9.0] 2504.01596 DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image
[{'name': 'Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, Xin Yang'}]
Depth Estimation 深度估计 v2
depth enhancement
dToF
3D reconstruction
depth completion
Input: Raw dToF signals and RGB images 原始dToF信号与RGB图像
Step1: Simulate real-world dToF data using synthetic datasets 使用合成数据集模拟真实世界的dToF数据
Step2: Develop a depth completion network integrating monocular depth estimation (MDE) 开发整合单目深度估计的深度补全网络
Step3: Perform training with noise-robust strategy 使用抗噪声的训练策略进行训练
Output: High-precision dense depth maps 高精度密集深度图
9.0 [9.0] 2504.01941 End-to-End Driving with Online Trajectory Evaluation via BEV World Model
[{'name': 'Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
trajectory evaluation
world model
Input: Sensor data 传感器数据
Step1: Trajectory prediction 轨迹预测
Step2: Future state prediction 未来状态预测
Step3: Trajectory evaluation 轨迹评估
Output: Optimized trajectories 优化的轨迹
8.5 [8.5] 2504.01040 Cal or No Cal? -- Real-Time Miscalibration Detection of LiDAR and Camera Sensors
[{'name': 'Ilir Tahiraj, Jeremialie Swadiryus, Felix Fent, Markus Lienkamp'}]
Autonomous Systems and Robotics 自动驾驶 v2
miscalibration detection
sensor fusion
autonomous driving
3D sensing
Input: LiDAR and camera data 数据集成: LiDAR和摄像头数据
Step1: Feature extraction 特征提取
Step2: Miscalibration state classification 失调状态分类
Step3: Performance analysis 性能分析
Output: Detection results 检测结果
8.5 [8.5] 2504.01298 Direction-Aware Hybrid Representation Learning for 3D Hand Pose and Shape Estimation
[{'name': 'Shiyong Liu, Zhihao Li, Xiao Tang, Jianzhuang Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D hand pose estimation
direction-aware hybrid features
joint optimization
motion capture
Input: RGB images from hand motion capture
Step1: Fusion of implicit image features and explicit 2D joint coordinates
Step2: Joint optimization of 2D and 3D coordinates
Step3: Motion capture confidence calculation based on contrastive learning
Output: Improved accuracy in 3D hand pose and shape estimation
8.5 [8.5] 2504.01428 MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
[{'name': 'Zhuangzhuang Chen, Hualiang Wang, Chubin Ou, Xiaomeng Li'}]
3D Image Translation 三维图像翻译 v2
3D image translation
multi-view alignment
optical coherence tomography
OCTA
Input: 3D Optical Coherence Tomography (OCT) images 3D光学相干断层扫描图像
Step1: Pre-train VQ-VAE models for OCT & OCTA data 对OCT和OCTA数据进行VQ-VAE模型预训练
Step2: Multi-view tri-alignment to learn mapping from OCT to OCTA using three views 三视角联合对齐学习从OCT到OCTA的映射
Output: Translated 3D OCTA images 翻译后的3D OCTA图像
8.5 [8.5] 2504.01449 Multimodal Point Cloud Semantic Segmentation With Virtual Point Enhancement
[{'name': 'Zaipeng Duan, Xuzhong Hu, Pei An, Jie Ma'}]
Point Cloud Processing 点云处理 v2
Point Cloud Segmentation 点云分割
Multi-modal Integration 多模态集成
Input: LiDAR and image data (virtual points) 激光雷达与图像数据(虚拟点)
Step1: Integration of virtual points from images 通过图像整合虚拟点
Step2: Adaptive filtering to select valuable pseudo points 采用自适应过滤选择有价值的伪点
Step3: Noise-robust feature extraction 噪声稳健特征提取
Output: Enhanced semantic segmentation results 改进的语义分割结果
8.5 [8.5] 2504.01466 Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes
[{'name': 'Kaiwei Zhang, Dandan Zhu, Xiongkuo Min, Guangtao Zhai'}]
3D Reconstruction and Modeling 三维重建 v2
mesh saliency
3D reconstruction
texture integration
Input: Mesh models 网格模型
Step1: Dataset creation 数据集创建
Step2: Model development 模型开发
Step3: Validation experiments 验证实验
Output: Saliency predictions for meshes 网格的显著性预测
8.5 [8.5] 2504.01620 A Conic Transformation Approach for Solving the Perspective-Three-Point Problem
[{'name': 'Haidong Wu, Snehal Bhayani, Janne Heikkil\"a'}]
3D Reconstruction and Modeling 三维重建 v2
Perspective-Three-Point problem
conic transformation
camera pose estimation
Input: 3D points and their 2D projections 3D点及其2D投影
Step1: Coordinate transformation 坐标变换
Step2: Solving for intersection points 求交点
Step3: Extracting camera pose 信息提取相机位置
Output: Camera pose and optimized parameters 输出:相机位置和优化参数
8.5 [8.5] 2504.01648 ProtoGuard-guided PROPEL: Class-Aware Prototype Enhancement and Progressive Labeling for Incremental 3D Point Cloud Segmentation
[{'name': 'Haosheng Li, Yuecong Xu, Junjie Chen, Kemi Ding'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D point cloud segmentation 3D点云分割
class-incremental learning 类增量学习
ProtoGuard
Input: 3D point clouds 3D点云
Step1: Base-class training with prototypes 基类训练与原型
Step2: Novel-class training with pseudo-labels 新类训练与伪标签
Step3: Evaluation of segmentations 分割的评估
Output: Enhanced segmentation accuracy 改进的分割精度
8.5 [8.5] 2504.01659 Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks
[{'name': 'Haosheng Li, Yuecong Xu, Junjie Chen, Kemi Ding'}]
3D Point Cloud Processing 点云处理 v2
3D point cloud segmentation
unsupervised domain adaptation
adversarial robustness
Input: 3D point cloud data 3D点云数据
Step1: Adversarial point cloud generation 攻击点云生成
Step2: Dataset formulation 数据集构建
Step3: Framework development 框架开发
Output: Robust segmentation model 稳健的分割模型
8.5 [8.5] 2504.01668 Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation
[{'name': 'Junjie Chen, Yuecong Xu, Haosheng Li, Kemi Ding'}]
3D Semantic Segmentation 三维语义分割 v2
3D semantic segmentation
unsupervised domain adaptation
autonomous driving
Input: 3D point cloud data 3D点云数据
Step1: Robustness evaluation评估鲁棒性
Step2: Invertible attention alignment构建可逆注意力对齐模块
Step3: Contrastive memory bank construction构建对比记忆库
Output: Enhanced segmentation performance改进的分割性能
8.5 [8.5] 2504.01764 Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
[{'name': 'Mingrui Ye, Lianping Yang, Hegui Zhu, Zenghao Zheng, Xin Wang, Yantao Lo'}]
3D Human Pose Estimation 3D人类姿态估计 v2
3D human pose estimation
Transformer
GCN
Input: RGB images and videos from a single viewpoint 使用单一视角的RGB图像和视频
Step1: Masking 2D pose features 对2D姿态特征进行掩蔽
Step2: Learning representations using Transformer-GCN model 使用Transformer-GCN模型学习表示
Step3: Adaptive fusion of features 特征的自适应融合
Output: Enhanced 3D human pose estimations 改进的3D人类姿态估计
8.0 [8.0] 2504.01589 Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models
[{'name': 'Zhaochen Wang, Yujun Cai, Zi Huang, Bryan Hooi, Yiwei Wang, Ming-Hsuan Yang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models (VLMs) 视觉语言模型
ASCII Art ASCII艺术
Step1: Evaluate five state-of-the-art VLMs on ASCII art tasks 测试五个最先进的视觉语言模型
7.5 [7.5] 2504.01308 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks
[{'name': 'Jiawei Wang, Yushen Zuo, Yuanjun Chai, Zhendong Liu, Yichen Fu, Yichun Feng, Kin-man Lam'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Gaussian noise
adversarial attacks
Input: VLMs and noisy visual inputs (e.g., images with Gaussian noise)
Step1: Conduct vulnerability analysis of VLMs absent noise augmentation
Step2: Develop Robust-VLGuard dataset with noise-augmented fine-tuning
Step3: Evaluate the performance of enhanced VLMs against adversarial perturbations
Output: A robust VLM framework able to handle Gaussian noise and improve functionality

Arxiv 2025-04-02

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.22986 FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction
[{'name': 'Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian splatting
indoor scene reconstruction
multi-view images
Input: Multi-view images 多视角图像
Step1: Low-cost Cross-View Aggregation framework 低成本跨视角聚合框架
Step2: Pixel-wise triplet fusion method 像素级三重融合方法
Step3: Weighted floater removal strategy 加权漂浮物去除策略
Step4: Depth-regularized per-scene fine-tuning depth-正则化的逐场景微调
Output: Enhanced 3D scene reconstruction 改进的三维场景重建
9.5 [9.5] 2503.23022 MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
[{'name': 'Xianglong He, Junyi Chen, Di Huang, Zexiang Liu, Xiaoshui Huang, Wanli Ouyang, Chun Yuan, Yangguang Li'}]
Mesh Reconstruction 网格重建 v2
3D reconstruction
mesh generation
deep learning
Input: Raw mesh data 原始网格数据
Step1: Encode meshes into continuous tokens 编码网格为连续标记
Step2: Use flow-based model to generate meshes 使用基于流的模型生成网格
Step3: Output the final mesh based on face control 根据面数控制输出最终网格
9.5 [9.5] 2503.23024 Empowering Large Language Models with 3D Situation Awareness
[{'name': 'Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng, Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D scene understanding
Vision-Language Models
situational awareness
Input: RGB-D videos RGB-D 视频
Step1: Data collection 数据收集
Step2: Caption generation 标题生成
Step3: Situation grounding 位置基础
Output: Situation-aware dataset 情境感知数据集
9.5 [9.5] 2503.23044 CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
[{'name': 'Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han'}]
3D Reconstruction and Modeling 三维重建 v2
large-scale reconstruction 大规模重建
geometric accuracy 几何准确性
3D scene modeling 三维场景建模
autonomous driving 自动驾驶
Input: Multi-view images 多视角图像
Step1: Develop parallelized hybrid hierarchical 3D representation 构建并行化的混合层次三维表示
Step2: Implement batch-level multi-task rendering 采用批量级别的多任务渲染
Step3: Conduct experiments on large-scale datasets 在大规模数据集上进行实验
Output: Enhanced large-scale 3D scene models 改进的大规模三维场景模型
9.5 [9.5] 2503.23162 NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
[{'name': 'Zhenyu Tang, Chaoran Feng, Xinhua Cheng, Wangbo Yu, Junwu Zhang, Yuan Liu, Xiaoxiao Long, Wenping Wang, Li Yuan'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
neural fields
3D reconstruction
compression methods
multilayer perceptron
Input: Original 3D Gaussian Splatting (3DGS) data 原始三维高斯点云(3DGS)数据
Step1: Compute Gaussian importance scores 计算高斯重要性分数
Step2: Prune less important Gaussians 修剪不太重要的高斯
Step3: Cluster Gaussians based on attributes 根据属性聚类高斯
Step4: Fit separate MLPs for each cluster 为每个聚类拟合不同的多层感知器 (MLPs)
Step5: Fine-tune NeuralGS representation and apply frequency loss 对NeuralGS表示进行微调并应用频率损失
Output: Compact 3D representation with reduced storage requirements 输出:具有减小存储要求的紧凑3D表示
9.5 [9.5] 2503.23282 AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos
[{'name': 'Felix Wimbauer, Weirong Chen, Dominik Muhle, Christian Rupprecht, Daniel Cremers'}]
3D Reconstruction and Modeling 三维重建 v2
camera poses
intrinsics
3D reconstruction
SfM
dynamic videos
Input: Casual video inputs 休闲视频输入
Step1: Preprocess video with depth and flow networks 通过深度和流网络预处理视频
Step2: Apply transformer model to estimate camera poses and intrinsics 应用变换器模型估计相机姿态和内参
Step3: Implement trajectory refinement to reduce drift 实施轨迹优化以减少漂移
Output: Accurate camera poses, intrinsics, and 4D pointclouds 输出: 精确的相机姿态、内参和4D点云
9.5 [9.5] 2503.23297 ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
[{'name': 'Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue'}]
3D Visual Grounding 三维视觉定位 v2
3D visual grounding
open-vocabulary
neural rendering
Input: Implicit language descriptions 语言描述
Step1: Adaptive grouping based on physical scale 基于物理尺度的自适应分组
Step2: 3D Gaussian feature splatting 3D高斯特征喷涂
Step3: Object localization 物体定位
Output: Accurate 3D grounding and reasoning 精确的3D定位与推理
9.5 [9.5] 2503.23337 Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction
[{'name': 'Jingui Ma, Yang Hu, Luyang Tang, Jiayu Yang, Yongqi Zhai, Ronggang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
Compression
Novel View Synthesis
Real-time Rendering
Input: 3D Gaussian representation 3D高斯表示
Step1: Introduce prediction technique 引入预测技术
Step2: Implement spatial condition-based prediction 实施基于空间条件的预测
Step3: Develop instance-aware hyper prior model 开发基于实例感知的超先验模型
Output: Compressed 3D Gaussian models 压缩的3D高斯模型
9.5 [9.5] 2503.23463 OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
[{'name': 'Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, Alois C. Knoll'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
Vision-Language Model 视觉-语言模型
Autonomous Driving 自动驾驶
Trajectory Generation 轨迹生成
Input: Multimodal inputs (3D environmental perception, vehicle state, driver commands) 输入:多模态输入(3D环境感知、车辆状态、驾驶员命令)
Step1: Hierarchical vision-language alignment 模块 步骤1:分层视觉-语言对齐模块
Step2: Autoregressive interaction modeling 步骤2:自回归交互建模
Step3: Trajectory generation 步骤3:轨迹生成
Output: Reliable driving trajectories 输出:可靠的驾驶轨迹
9.5 [9.5] 2503.23502 Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model
[{'name': 'Jannik Endres, Oliver Hahn, Charles Corbi\`ere, Simone Schaub-Meyer, Stefan Roth, Alexandre Alahi'}]
Stereo Vision 立体视觉 v2
omnidirectional stereo matching
depth estimation
robotics
Input: Equirectangular images captured by two vertically stacked omnidirectional cameras 拍摄的两个垂直堆叠的全景相机的等距图像
Step1: Integrate the pre-trained monocular depth foundation model into the stereo matching architecture 将预训练的单眼深度基础模型集成到立体匹配架构中
Step2: Apply a two-stage training strategy to adapt features to omnidirectional stereo matching 采用两阶段训练策略将特征适应于全景立体匹配
Step3: Fine-tune the model using scale-invariant loss against actual depth data 使用无尺度损失对实际深度数据微调模型
Output: Enhanced disparity estimation and improved depth accuracy 改进的视差估计和深度准确性
9.5 [9.5] 2503.23664 LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds
[{'name': 'Masahiko Tsuji, Hitoshi Niigaki, Ryuichi Tanida'}]
3D Reconstruction and Modeling 三维重建与建模 v2
Visual Localization 视觉定位
3D Reconstruction 三维重建
LiDAR
Camera Pose Estimation 相机姿态估计
Input: Query image and 3D LiDAR point clouds 查询图像和3D LiDAR点云
Step1: Extract keypoints from the reference image 从参考图像中提取关键点
Step2: Generate a 3D reference map with keypoints using LiDAR 生成包含关键点的3D参考地图,使用LiDAR
Step3: Assign 3D LiDAR points directly to 2D keypoints 直接将3D LiDAR点分配给2D关键点
Output: Enhanced camera pose estimation through a dense 3D reference map 输出:通过密集的3D参考地图增强相机姿态估计
9.5 [9.5] 2503.23670 Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
[{'name': 'Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, Zhizhong Han'}]
Surface Reconstruction 表面重建 v2
3D reconstruction
signed distance functions
sparse point clouds
Input: Sparse point clouds 稀疏点云
Step1: Learn bijective surface parameterization (BSP) 学习双射表面参数化
Step2: Construct dynamic deformation network 动态变形网络构建
Step3: Optimize grid deformation to refine surfaces 优化网格变形以精炼表面
Output: Signed distance functions (SDF) representation of the surface 表面的符号距离函数表示
9.5 [9.5] 2503.23684 Detail-aware multi-view stereo network for depth estimation
[{'name': 'Haitao Tian, Junyang Li, Chenxing Wang, Helong Jiang'}]
Multi-view Stereo 多视角立体 v2
Multi-view stereo
Depth estimation
3D reconstruction
Geometric depth
Input: Multi-view images 多视角图像
Step1: Geometric depth embedding 数据几何深度嵌入
Step2: Image synthesis loss enhancement 图像合成损失增强
Step3: Adaptive depth interval adjustment 自适应深度区间调整
Output: Accurate depth maps 精确的深度图
9.5 [9.5] 2503.23747 Consistency-aware Self-Training for Iterative-based Stereo Matching
[{'name': 'Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Liu YangChenXu, Wu Cailin, Feng Xu, Tao Chen'}]
Multi-view Stereo 多视角立体 v2
stereo matching
depth estimation
3D vision
Input: Pairs of rectified images 视差分割的处理方式
Step1: Introduce consistency-aware self-training framework 引入一致性自我训练框架
Step2: Implement consistency-aware soft filtering module 实现一致性软过滤模块
Step3: Adjust weights of pseudo-labels with soft-weighted loss 使用软加权损失调整伪标签权重
Output: Enhanced stereo matching performance 提升的立体匹配性能
9.5 [9.5] 2503.23881 ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image
[{'name': 'Tianyi Gong, Boyan Li, Yifei Zhong, Fangxin Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
single-view reconstruction
Gaussian Splatting
panoramic image generation
Input: Single-view image 单视图图像
Step1: Generate panoramic image 生成全景图像
Step2: Depth estimation 深度估计
Step3: 3D Gaussian Splatting model training 训练3D高斯点云模型
Step4: Refinement with video diffusion 通过视频扩散进行优化
Output: Consistent immersive 3D scene 一致的沉浸式3D场景
9.5 [9.5] 2503.23965 Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving
[{'name': 'Miao Fan, Xuxu Kong, Shengtong Xu, Haoyi Xiong, Xiangzeng Liu'}]
Autonomous Driving 自动驾驶 v2
traffic light recognition
autonomous driving
neural networks
real-time processing
Input: Multi-frame video data 多帧视频数据
Step1: Temporal data integration 时间数据集成
Step2: Neural network architecture design 神经网络架构设计
Step3: Real-time processing capabilities evaluation 实时处理能力评估
Output: Robust traffic light recognition results 稳健的交通灯识别结果
9.5 [9.5] 2503.23993 DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model
[{'name': 'Ming Yuan, Sichao Wang, Chuang Zhang, Lei He, Qing Xu, Jianqiang Wang'}]
Depth Estimation 深度估计 v2
depth completion
autonomous driving
diffusion model
3D reconstruction
Input: Sparse depth maps and RGB images 稀疏深度图和 RGB 图像
Step1: Feature extraction 特征提取
Step2: Conditional diffusion process 条件扩散过程
Step3: Multi-step iterative refinement 多步迭代优化
Output: Dense depth map 生成密集深度图
9.5 [9.5] 2503.24210 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
[{'name': 'Seungjun Lee, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
motion deblurring
event streams
Input: Blurry multi-view images and event streams 模糊的多视角图像和事件流
Step1: Optimize deblurring 3DGS 通过联合利用实际捕获的事件流和预训练的扩散模型约束去模糊3DGS
Step2: Introduce EDI constraints 引入事件双积分约束
Step3: Leverage diffusion prior 为了进一步改善细节,利用扩散先验
Output: Enhanced 3D representations 改进的3D表示
9.5 [9.5] 2503.24229 Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes
[{'name': 'Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka'}]
3D Point Cloud Processing 点云处理 v2
3D point cloud segmentation
synthetic data
generative models
Input: 3D point cloud data 3D点云数据
Step1: Data generation 数据生成
Step2: Model training 模型训练
Step3: Model evaluation 模型评估
Output: Improved instance segmentation results 改进的实例分割结果
9.5 [9.5] 2503.24374 ERUPT: Efficient Rendering with Unposed Patch Transformer
[{'name': 'Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor'}]
3D Reconstruction and Modeling 三维重建 v2
novel view synthesis
scene reconstruction
unposed imagery
3D reconstruction
computer vision
Input: Small collections of RGB images 小规模RGB图像集
Step1: Patch-based querying of unposed imagery 基于补丁的无姿势图像查询
Step2: Latent camera pose learning 学习潜在相机姿态
Step3: Efficient model rendering and training 模型的高效渲染和训练
Output: High-quality rendered images 高质量渲染图像
9.5 [9.5] 2503.24382 Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
[{'name': 'Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, Zhaopeng Cui'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
neural rendering
view synthesis
layered Gaussian
Input: Extremely sparse views (3-4) 极稀疏视图
Step1: Use dense stereo reconstruction to recover coarse geometry 使用稠密立体重建恢复粗糙几何
Step2: Apply layered Gaussian representation for scene modeling 应用分层高斯表示进行场景建模
Step3: Integrate reconstruction and generation iteratively 迭代整合重建与生成
Output: High-quality 3D reconstruction and unbounded view synthesis 输出: 高质量三维重建和无界视图合成
9.2 [9.2] 2503.23882 GLane3D : Detecting Lanes with Graph of 3D Keypoints
[{'name': 'Halil \.Ibrahim \"Ozt\"urk, Muhammet Esat Kalfao\u{g}lu, Ozsel Kilinc'}]
3D Reconstruction and Modeling 三维重建 v2
3D lane detection 3D车道检测
autonomous driving 自动驾驶
Input: Multi-view images 多视角图像
Step1: Keypoint detection 关键点检测
Step2: Sequential connection prediction 顺序连接预测
Step3: Lane extraction 车道提取
Output: Complete 3D lanes 完整的三维车道
9.2 [9.2] 2503.24366 StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
[{'name': 'Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi'}]
Neural Rendering 神经渲染 v2
3D Gaussian splatting
stochastic rasterization
neural rendering
Input: 3D Gaussian splatting 3D高斯点云
Step1: Implement stochastic rasterization 实现随机光栅化
Step2: Use Monte Carlo estimator 使用蒙特卡罗估计器
Step3: Render using OpenGL shaders 使用OpenGL着色器渲染
Output: Fast and high-quality rendering 快速高质量渲染
9.2 [9.2] 2503.24391 Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
[{'name': 'Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen'}]
3D Reconstruction and Modeling 三维重建 v2
4D reconstruction
dynamic segmentation
camera pose estimation
Input: Dynamic image collections 动态图像集
Step1: Attention adaptation during inference 推理期间注意力适应
Step2: Dynamic object segmentation 动态目标分割
Step3: Camera pose estimation 相机位姿估计
Output: 4D dense point map reconstruction 4D稠密点图重建
9.0 [9.0] 2503.23587 PhysPose: Refining 6D Object Poses with Physical Constraints
[{'name': "Martin Malenick\'y, Martin C\'ifka, M\'ed\'eric Fourmy, Louis Montaut, Justin Carpentier, Josef Sivic, Vladimir Petrik"}]
Robotic Perception 机器人感知 v2
6D object pose estimation
physical constraints
robotics
scene reconstruction
autonomous driving
Input: Images and geometric scene description (输入: 图像和几何场景描述)
Step 1: Estimate initial 6D object poses (步骤 1: 估计初始的 6D 物体姿态)
Step 2: Post-process to enforce physical consistency (步骤 2: 后处理以强制物理一致性)
Step 3: Evaluate and refine pose estimates (步骤 3: 评估和改进姿态估计)
Output: Accurate and physically plausible object poses (输出: 准确且物理上合理的物体姿态)
9.0 [9.0] 2503.23963 A Benchmark for Vision-Centric HD Mapping by V2I Systems
[{'name': 'Miao Fan, Shanshan Yu, Shengtong Xu, Kun Jiang, Haoyi Xiong, Xiangzeng Liu'}]
3D Reconstruction and Modeling 三维重建 v2
Vehicle-to-Infrastructure (V2I)
HD mapping
autonomous driving
neural framework
vectorized maps
Input: Collaborative camera frames from vehicles and infrastructures 车辆与基础设施的协作摄像头帧
Step1: Data collection and annotation 数据收集与标注
Step2: Extract features from images 提取图像特征
Step3: Construct BEV representation 构建鸟瞰视图表示
Step4: Generate and update vectorized maps 生成并更新矢量化地图
Output: Vectorized high-definition maps 矢量化高清地图
8.5 [8.5] 2503.22963 SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry
[{'name': 'Peiyu Chen, Fuling Lin, Weipeng Guan, Peng Lu'}]
Visual Odometry 视觉里程计 v2
event camera
inertial odometry
3D reconstruction
sensor fusion
Input: Event streams from event cameras 事件相机的事件流
Step1: Event feature detection using CNN 使用CNN进行事件特征检测
Step2: Descriptor matching for loop closure using GNN 使用GNN进行环路闭合的描述符匹配
Step3: Optimize pipeline using TensorRT 优化使用TensorRT的管道
Output: Robust event-inertial odometry 可靠的事件惯性里程计
8.5 [8.5] 2503.22976 From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
[{'name': 'Jiahui Zhang, Yurui Chen, Yanpeng Zhou, Yueming Xu, Ze Huang, Jilin Mei, Junhui Chen, Yu-Jie Yuan, Xinyue Cai, Guowei Huang, Xingyue Quan, Hang Xu, Li Zhang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
3D reasoning
dataset creation
Input: 2D images and 3D ground-truth data 2D图像和三维真实数据
Step1: Data generation and annotation 数据生成与标注
Step2: Dataset creation for spatial tasks 数据集创建用于空间任务
Step3: Benchmark development for evaluation 基准开发用于评估
Output: Enhanced spatial reasoning capabilities 改进的空间推理能力
8.5 [8.5] 2503.23062 Shape and Texture Recognition in Large Vision-Language Models
[{'name': 'Sagi Eppel, Mor Bismut, Alona Faktor'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
shape recognition
texture recognition
vision-language models
3D understanding
Input: Real-world images 真实世界图像
Step1: Dataset creation 数据集创建
Step2: Shape and texture recognition tests 形状和纹理识别测试
Step3: Evaluation of large vision-language models 大型视觉语言模型的评估
Output: Performance metrics on shape and texture recognition 形状和纹理识别的性能指标
8.5 [8.5] 2503.23105 Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments
[{'name': 'Yifan Xu, Vineet Kamat, Carol Menassa'}]
Robotic Perception 机器人感知 v2
semantic segmentation
autonomous assistive robots
Input: Built environment scenes 场景输入
Step1: Scene segmentation 场景分割
Step2: Semantic recognition 语义识别
Step3: Uncertainty alignment 不确定性对齐
Output: Adaptive navigation model 自适应导航模型
8.5 [8.5] 2503.23109 Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
[{'name': 'Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
HD map construction
autonomous vehicles
Input: HD maps 高精度地图
Step1: Uncertainty resampling 不确定性重采样
Step2: Structural feature extraction 结构特征提取
Step3: Map vectorization 地图矢量化
Output: Generalized HD maps 泛化的高精度地图
8.5 [8.5] 2503.23313 SpINR: Neural Volumetric Reconstruction for FMCW Radars
[{'name': 'Harshvardhan Takawale, Nirupam Roy'}]
Volumetric Reconstruction 体积重建 v2
volumetric reconstruction
FMCW radar
3D modeling
Input: FMCW radar data
Step1: Frequency-domain modeling
Step2: Implicit neural representation training
Step3: 3D volumetric geometry reconstruction
Output: High-resolution 3D models
8.5 [8.5] 2503.23331 HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
[{'name': 'Hongwei Zheng, Han Li, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong'}]
3D Reconstruction 三维重建 v2
3D human pose estimation
occlusion
hierarchical poses
sparse representation
Input: Sparse 2D poses from monocular images 单目图像中的稀疏2D姿势
Step1: Multi-scale skeleton tokenization 多尺度骨架标记
Step2: Hierarchical pose generation 分层姿势生成
Step3: 2D-to-3D lifting with generated poses 通过生成的姿势进行2D到3D的提升
Output: Enhanced 3D human poses 改进的3D人体姿势
8.5 [8.5] 2503.23365 OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users
[{'name': 'Zhangcun Yan, Jianqing Li, Peng Hang, Jian Sun'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人学 v2
High-resolution trajectory data 高分辨率轨迹数据
Vulnerable Road Users 弱势交通参与者
Autonomous driving 自动驾驶
Input: High-resolution trajectory data 高分辨率轨迹数据
Step1: Data collection 数据收集
Step2: Data integration 数据集成
Step3: Analysis of behavioral patterns 行为模式分析
Output: Comprehensive dataset for autonomous driving autonomous systems 输出: 自主驾驶系统的综合数据集
8.5 [8.5] 2503.23519 BoundMatch: Boundary detection applied to semi-supervised segmentation for urban-driving scenes
[{'name': 'Haruya Ishikawa, Yoshimitsu Aoki'}]
Autonomous Systems and Robotics 自动驾驶 v2
semi-supervised segmentation
boundary detection
autonomous driving
Input: Unlabeled images and labeled data; 输入: 无标签图像和有标签数据
Step1: Implement Boundary-Semantic Fusion to combine boundary cues with segmentation; 步骤1: 实施边界-语义融合,将边界线索与分割结合
Step2: Integrate Boundary Consistency Regularized Multi-Task Learning; 步骤2: 集成边界一致性正则化多任务学习
Step3: Evaluate model performance on various datasets; 步骤3: 在各种数据集上评估模型性能
Output: Enhanced segmentation masks with improved boundary delineation; 输出: 改进的分割掩码,具有更好的边界划分
8.5 [8.5] 2503.23577 Multiview Image-Based Localization
[{'name': 'Cameron Fiore, Hongyi Fan, Benjamin Kimia'}]
3D Localization 3D定位 v2
3D localization
image retrieval
autonomous driving
multiview correspondences
Input: Query image and anchor images 查询图像和锚图像
Step1: Compute NetVLAD descriptors and SuperPoint features 计算NetVLAD描述符和SuperPoint特征
Step2: Retrieve top-K anchor images 根据特征描述符检索前K个锚图像
Step3: Estimate relative poses 估计相对位姿
Step4: Find optimal camera center and orientation 查找最佳相机中心和方向
Output: Accurate pose estimation 输出:准确的位姿估计
8.5 [8.5] 2503.23606 Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries
[{'name': 'Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo'}]
Depth Estimation 深度估计 v2
depth estimation
depth from defocus
neural networks
Input: Pair of differently defocused images 处理:两个不同模糊的图像
Step1: Data representation through Blurry-Edges 数据表示,使用模糊边缘
Step2: Depth calculation using closed-form DfD relation 深度计算,使用封闭形式的Dfd关系
Output: Depth estimation along the boundaries 输出:沿边界的深度估计
8.5 [8.5] 2503.23647 Introducing the Short-Time Fourier Kolmogorov Arnold Network: A Dynamic Graph CNN Approach for Tree Species Classification in 3D Point Clouds
[{'name': 'Said Ohamouddou, Mohamed Ohamouddou, Hanaa El Afia, Abdellatif El Afia, Rafik Lasri, Raddouane Chiheb'}]
3D Point Cloud Processing 点云处理 v2
3D point cloud
tree species classification
deep learning
STFT-KAN
Input: 3D point clouds 3D点云
Step1: Implementation of STFT-KAN STFT-KAN的实现
Step2: Model training and evaluation 模型训练与评估
Output: Tree species classification results 树种分类结果
8.5 [8.5] 2503.23702 3D Dental Model Segmentation with Geometrical Boundary Preserving
[{'name': 'Shufan Xi, Zexian Liu, Junlin Chang, Hongyu Wu, Xiaogang Wang, Aimin Hao'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
segmentation
digital dentistry
computer vision
Input: 3D intraoral scan mesh 3D口腔扫描网格
Step1: Selective downsampling method 选择性下采样方法
Step2: Boundary feature extraction 边界特征提取
Step3: Model evaluation 模型评估
Output: Improved segmentation accuracy 改进的分割精度
8.5 [8.5] 2503.23980 SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency
[{'name': 'Yanbo Wang, Yongtao Chen, Chuan Cao, Tianchen Deng, Wentao Zhao, Jingchuan Wang, Weidong Chen'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR
Semantic segmentation
Zero-shot learning
Input: General LiDAR point clouds 一般LiDAR点云
Step1: Data transformation 数据转换
Step2: Zero-shot learning paradigm implementation 零样本学习范式实现
Step3: Pre-segmentation result generation 预分割结果生成
Output: Enhanced annotation efficiency 改进的注释效率
8.5 [8.5] 2503.24091 4D mmWave Radar in Adverse Environments for Autonomous Driving: A Survey
[{'name': 'Xiangyuan Peng, Miao Tang, Huawei Sun, Lorenzo Servadei, Robert Wille'}]
Autonomous Driving 自动驾驶 v2
4D mmWave radar
autonomous driving
adverse environments
Input: 4D mmWave radar data 4D毫米波雷达数据
Step1: Review of existing datasets 现有数据集的回顾
Step2: Analysis of methods and models 方法和模型的分析
Step3: Discussion on challenges and future directions 挑战与未来方向的讨论
Output: Comprehensive survey report 综合调查报告
8.5 [8.5] 2503.24129 It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
[{'name': 'Dominik Schnaus, Nikita Araslanov, Daniel Cremers'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language correspondence
unsupervised learning
quadratic assignment problem
Input: Vision and language embeddings 视觉和语言嵌入
Step1: Formulate unsupervised matching as a quadratic assignment problem 将无监督匹配形式化为二次分配问题
Step2: Develop a heuristic for matching 提出匹配的启发式方法
Step3: Evaluate on datasets 在数据集上评估
Output: Unsupervised classification results without annotations 无需注释的无监督分类结果
8.5 [8.5] 2503.24270 Visual Acoustic Fields
[{'name': 'Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
sound localization
sound generation
Input: Multiscale features from 3D Gaussian Splatting (3DGS)
Step1: Sound generation module utilizing a conditional diffusion model
Step2: Sound localization module for querying impact positions in 3D scene
Output: Generated sounds and localized impact sources in 3D space
8.5 [8.5] 2503.24306 Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge
[{'name': "Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jon\'a\v{s} \v{S}er\'ych, Michal Neoral, Ji\v{r}\'i Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queir\'os, Est\^ev\~ao Lima, Jo\~ao L. Vila\c{c}a, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen, Ling Li, Xiang Ma, Xiaojian Li, Mona Sheikh Zeinoddin, Xu Wang, Zafer Tandogdu, Greg Shaw, Evangelos Mazomenos, Danail Stoyanov, Yuxin Chen, Zijian Wu, Alexander Ladikos, Simon DiMaio, Septimiu E. Salcudean, Omid Mohareri"}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
autonomous driving
surgery
algorithms
Input: Infrared video sequences 红外视频序列
Step1: Data quantification 数据量化
Step2: Algorithm submission 提交算法
Step3: Performance evaluation 性能评估
Output: Algorithm performance metrics 算法性能指标
8.5 [8.5] 2503.24381 UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
[{'name': 'Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li'}]
Autonomous Driving 自动驾驶 v2
Occupancy forecasting 占用预测
Autonomous driving 自动驾驶
3D Occupancy labels 三维占用标签
Input: Camera images 摄像头图像
Step1: Data integration 数据集成
Step2: Occupancy forecasting 模型预测
Step3: Evaluation of performance 性能评估
Output: Occupancy predictions 预测的占用情况
8.0 [8.0] 2503.22932 Bi-Level Multi-View fuzzy Clustering with Exponential Distance
[{'name': 'Kristina P. Sinaga'}]
Multi-view Stereo 多视角立体 v2
multi-view clustering
fuzzy c-means
exponential distance
Input: Multi-view data 多视角数据
Step1: Extend fuzzy c-means clustering 扩展模糊c均值聚类
Step2: Incorporate heat-kernel coefficients 引入热核系数
Step3: Develop bi-level clustering algorithm 开发双层聚类算法
Output: Enhanced clustering results 改进的聚类结果
7.5 [7.5] 2503.23131 RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning
[{'name': 'Alexander Vogel, Omar Moured, Yufan Chen, Jiaming Zhang, Rainer Stiefelhagen'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
Vision-Language Models
chart understanding
visual grounding
Input: Chart images 图表图像
Step1: Data collection 数据收集
Step2: Instruction tuning 指令调优
Step3: Visual grounding implementation 可视化基础实现
Output: RefChartQA dataset and model outputs RefChartQA数据集及模型输出
7.5 [7.5] 2503.23452 VideoGen-Eval: Agent-based System for Video Generation Evaluation
[{'name': 'Yuhang Yang, Ke Fan, Shangkun Sun, Hongxiang Li, Ailing Zeng, FeiLin Han, Wei Zhai, Wei Liu, Yang Cao, Zheng-Jun Zha'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
evaluation system
Input: Video generation prompts 视频生成提示
Step1: Content structuring 内容结构
Step2: Content judgment 内容评估
Step3: Dynamic evaluation tools 部件动态评估工具
Output: Evaluation results 评估结果
7.5 [7.5] 2503.23573 DASH: Detection and Assessment of Systematic Hallucinations of VLMs
[{'name': 'Maximilian Augustin, Yannic Neuhaus, Matthias Hein'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
VLMs
object hallucination
evaluation
Input: Real-world images 真实世界的图像
Step1: Data retrieval 数据检索
Step2: Systematic hallucination detection 系统性幻觉检测
Output: Clusters of hallucinated images 幻觉图像的聚类
6.5 [6.5] 2503.23508 Re-Aligning Language to Visual Objects with an Agentic Workflow
[{'name': 'Yuming Chen, Jiangyan Feng, Haodong Zhang, Lijun Gong, Feng Zhu, Rui Zhao, Qibin Hou, Ming-Ming Cheng, Yibing Song'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
language-object alignment
vision-language models
Input: Detected objects and raw language expressions 检测到的对象和原始语言表达
Step1: Reasoning state and planning 状态推理与规划
Step2: Adaptive prompt adjustment 自适应提示调整
Step3: Feedback analysis from LLM LLM反馈分析
Output: Re-aligned language expressions 重新对齐的语言表达

Arxiv 2025-04-01

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.22986 FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction
[{'name': 'Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
indoor scenes
Input: Multi-view images 多视角图像
Step1: Low-cost Cross-View Aggregation framework 低成本交叉视图聚合框架
Step2: Pixel-wise triplet fusion method 像素级三元组融合方法
Step3: Weighted floater removal strategy 加权浮子去除策略
Step4: Depth-regularized per-scene fine-tuning 深度规则化每场景微调
Output: Enhanced 3D Gaussian primitives 改进的三维高斯原语
9.5 [9.5] 2503.23044 CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
[{'name': 'Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han'}]
3D Reconstruction and Modeling 3D重建与建模 v2
large-scale scene reconstruction 大规模场景重建
3D Gaussian Splatting 3D高斯点
multi-GPU rendering 多GPU渲染
Input: Multi-view images 多视角图像
Step1: Dynamic voxel allocation 动态体素分配
Step2: Batch rendering techniques 批量渲染技术
Step3: Parallel training and rendering 并行训练与渲染
Output: High-fidelity 3D models 高保真3D模型
9.5 [9.5] 2503.23162 NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations
[{'name': 'Zhenyu Tang, Chaoran Feng, Xinhua Cheng, Wangbo Yu, Junwu Zhang, Yuan Liu, Xiaoxiao Long, Wenping Wang, Li Yuan'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Gaussian Splatting 3D高斯点云
Neural Rendering 神经渲染
3D Reconstruction 三维重建
Input: Original 3D Gaussian Splatting 3DGS 原始3D高斯点云
Step1: Importance calculation 重要性计算
Step2: Gaussian clustering 高斯聚类
Step3: Tiny MLP fitting 小型多层感知机拟合
Output: Compact 3D representation 紧凑的3D表示
9.5 [9.5] 2503.23282 AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos
[{'name': 'Felix Wimbauer, Weirong Chen, Dominik Muhle, Christian Rupprecht, Daniel Cremers'}]
3D Reconstruction and Modeling 三维重建 v2
camera poses
intrinsics
3D reconstruction
dynamic scenes
Input: Dynamic video sequences 动态视频序列
Step1: Predict camera poses and intrinsics 预测相机姿态和内参
Step2: Apply uncertainty-based loss formulation 应用基于不确定性的损失函数
Step3: Perform trajectory refinement 进行轨迹优化
Output: High-quality 4D pointclouds 高质量4D点云
9.5 [9.5] 2503.23297 ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
[{'name': 'Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue'}]
3D Visual Grounding 3D视觉定位 v2
3D grounding
language models
Gaussian features
Input: Implicit language descriptions 语言描述
Step1: Use 3D Gaussian feature fields 使用3D高斯特征场
Step2: Adaptive grouping based on object scale 根据物体尺度进行自适应分组
Step3: Localize occluded objects 进行遮挡物体定位
Output: Enhanced 3D grounding 改进的三维定位
9.5 [9.5] 2503.23463 OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model
[{'name': 'Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, Alois C. Knoll'}]
Autonomous Driving 自动驾驶 v2
Vision-Language Models 视觉语言模型
Autonomous Driving 自动驾驶
Input: Multimodal inputs including 3D environmental perception 3D环境感知, ego vehicle states 自我车辆状态, and driver commands 驾驶员命令
Step1: Hierarchical vision-language alignment process 层次化视觉语言对齐过程
Step2: Model generates driving trajectories 生成驾驶轨迹
Step3: Evaluate agent-env-ego interactions 评估主体-环境-自我交互
Output: Reliable driving actions 可靠的驾驶动作
9.5 [9.5] 2503.23502 Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model
[{'name': 'Jannik Endres, Oliver Hahn, Charles Corbi\`ere, Simone Schaub-Meyer, Stefan Roth, Alexandre Alahi'}]
Stereo Matching 立体匹配 v2
omnidirectional stereo matching
depth estimation
mobile robotics
3D reconstruction
Input: Equirectangular images captured with two vertically stacked omnidirectional cameras 通过两个垂直堆叠的全向相机采集的等距图像
Step1: Integrate depth foundation model into stereo matching architecture 将深度基础模型集成到立体匹配架构中
Step2: Two-stage training: Adapt stereo matching head and fine-tune foundation model 两阶段训练:调整立体匹配头和微调基础模型
Step3: Evaluate performance on real-world datasets 在真实数据集上评估性能
Output: Enhanced disparity estimation and 3D depth maps 改进的视差估计和三维深度图
9.5 [9.5] 2503.23664 LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds
[{'name': 'Masahiko Tsuji, Hitoshi Niigaki, Ryuichi Tanida'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
LiDAR
Input: 2D keypoints from reference images 参考图像中的2D关键点
Step1: Generate 3D reference map using 3D reconstruction 使用3D重建生成3D参考地图
Step2: Assign 3D LiDAR point clouds to keypoints 将3D LiDAR点云分配给关键点
Step3: Improve pose estimation accuracy 提高姿态估计准确性
Output: Dense and accurate 3D reference maps 密集且准确的3D参考地图
9.5 [9.5] 2503.23670 Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
[{'name': 'Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, Zhizhong Han'}]
3D Reconstruction and Modeling 三维重建 v2
Surface Reconstruction 表面重建
Signed Distance Functions 有符号距离函数
Sparse Point Clouds 稀疏点云
Input: Sparse point clouds 稀疏点云
Step1: Learn dynamic deformation network 学习动态变形网络
Step2: Bijective surface parameterization (BSP) learning 学习双射表面参数化
Step3: Grid deformation optimization (GDO) 应用网格变形优化
Output: Continuous signed distance functions (SDF) 生成连续的有符号距离函数
9.5 [9.5] 2503.23684 Detail-aware multi-view stereo network for depth estimation
[{'name': 'Haitao Tian, Junyang Li, Chenxing Wang, Helong Jiang'}]
Multi-view Stereo 多视角立体 v2
depth estimation
multi-view stereo
3D reconstruction
image synthesis
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Geometric depth embedding 算法开发
Step3: Adaptive depth interval adjustment 自适应深度间隔调整
Output: Accurate depth maps 精确的深度图
9.5 [9.5] 2503.23747 Consistency-aware Self-Training for Iterative-based Stereo Matching
[{'name': 'Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Liu YangChenXu, Wu Cailin, Feng Xu, Tao Chen'}]
Stereo Matching 立体匹配 v2
stereo matching
self-training
depth estimation
computer vision
autonomous driving
Input: Stereo image pairs 立体图像对
Step1: Reliability evaluation 可靠性评估
Step2: Soft filtering of pseudo-labels 伪标签软过滤
Step3: Model training with weighted loss 使用加权损失进行模型训练
Output: Enhanced stereo matching results 改进的立体匹配结果
9.5 [9.5] 2503.23881 ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image
[{'name': 'Tianyi Gong, Boyan Li, Yifei Zhong, Fangxin Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
single-view image reconstruction
Gaussian Splatting
panoramic image generation
Input: Single-view image 单视图图像
Step1: Generate panoramic image 生成全景图像
Step2: Depth estimation 深度估计
Step3: Train initial 3D Gaussian Splatting model 训练初始3D高斯点云模型
Step4: GS refinement with video diffusion priors 使用视频扩散先验进行GS优化
Output: Enhanced immersive 3D scene 改进的沉浸式3D场景
9.5 [9.5] 2503.23882 GLane3D : Detecting Lanes with Graph of 3D Keypoints
[{'name': 'Halil \.Ibrahim \"Ozt\"urk, Muhammet Esat Kalfao\u{g}lu, Ozsel Kilinc'}]
3D Reconstruction and Modeling 三维重建 v2
3D lane detection
autonomous driving
Input: 3D lane data 3D车道数据
Step1: Keypoint detection 关键点检测
Step2: Connection prediction 连接预测
Step3: Lane construction 道路构建
Output: 3D lanes 3D车道
9.5 [9.5] 2503.23963 A Benchmark for Vision-Centric HD Mapping by V2I Systems
[{'name': 'Miao Fan, Shanshan Yu, Shengtong Xu, Kun Jiang, Haoyi Xiong, Xiangzeng Liu'}]
Autonomous Driving 自动驾驶 v2
HD mapping
vehicle-to-infrastructure
autonomous driving
Input: Collaborative camera frames from vehicles and infrastructure 车辆和基础设施的协作摄像头帧
Step1: Data collection and annotation 数据收集和标注
Step2: Feature extraction 特征提取
Step3: Map encoding and decoding 地图编码和解码
Output: Vectorized high-definition maps 向量化高精度地图
9.5 [9.5] 2503.23965 Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving
[{'name': 'Miao Fan, Xuxu Kong, Shengtong Xu, Haoyi Xiong, Xiangzeng Liu'}]
Autonomous Driving 自动驾驶 v2
traffic light recognition
autonomous driving
real-time processing
end-to-end neural network
Input: Video frames from ego cameras 来自自我摄像机的视频帧
Step1: Multi-frame processing 多帧处理
Step2: Traffic light detection and classification 交通信号灯检测和分类
Step3: Integration with HD maps 与高清地图集成
Output: Real-time traffic light recognition 实时交通信号灯识别
9.5 [9.5] 2503.23993 DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model
[{'name': 'Ming Yuan, Sichao Wang, Chuang Zhang, Lei He, Qing Xu, Jianqiang Wang'}]
Depth Estimation 深度估计 v2
depth completion 深度补全
autonomous driving 自动驾驶
conditional diffusion model 条件扩散模型
Input: Sparse depth maps and RGB images 稀疏深度图和RGB图像
Step1: Data integration 数据集成
Step2: Conditional depth denoising using diffusion model 条件深度去噪
Step3: Multi-step iterative refinement 多步迭代优化
Output: Enhanced dense depth maps 改进的稠密深度图
9.5 [9.5] 2503.24210 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
[{'name': 'Seungjun Lee, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Gaussian Splatting
motion deblurring
event streams
novel view synthesis
Input: Blurry multi-view images and event streams 模糊多视角图像及事件流
Step1: Optimize deblurring 3DGS using event streams and diffusion prior 优化去模糊3DGS,利用事件流和扩散先验
Step2: Enhance edge details and color accuracy 强化边缘细节和颜色准确性
Output: Improved sharp 3D representations 改进的锐利3D表示
9.5 [9.5] 2503.24366 StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
[{'name': 'Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi'}]
Neural Rendering 神经渲染 v2
3D Gaussian splatting
stochastic rasterization
neural rendering
volume rendering
Input: 3D Gaussian splatting data 3D高斯点云数据
Step1: Integrate stochastic rasterization techniques 整合随机光栅化技术
Step2: Implement unbiased Monte Carlo estimator 实现无偏蒙特卡洛估计器
Step3: Optimize rendering performance 优化渲染性能
Output: Efficient 3D rendering output 高效的三维渲染输出
9.5 [9.5] 2503.24374 ERUPT: Efficient Rendering with Unposed Patch Transformer
[{'name': 'Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor'}]
3D Reconstruction and Modeling 3D重建与建模 v2
3D reconstruction 3D重建
novel view synthesis 新视图合成
efficient rendering 高效渲染
Input: Collections of RGB images RGB图像集
Step1: Patch-based querying using unposed imagery 基于补丁的查询,使用未定位的图像
Step2: Model training with learned latent camera pose 模型训练,使用学习到的潜在相机姿态
Step3: Efficient rendering at high frame rates 实现高帧率的高效渲染
Output: Novel view synthesis of 3D scenes 3D场景的新视图合成
9.5 [9.5] 2503.24382 Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
[{'name': 'Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, Zhaopeng Cui'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
unbounded view synthesis
neural rendering
Input: Extremely sparse views (3-4) 极少视角输入(3-4)
Step1: Employ dense stereo reconstruction model to recover coarse geometry 使用密集立体重建模型恢复粗略几何
Step2: Introduce layered Gaussian-based representation to model scenes 引入分层高斯表示来建模场景
Step3: Perform bootstrap optimization for noise refinement and occlusion filling 进行引导优化以消除噪声和填补遮挡区域
Step4: Implement iterative fusion of reconstruction and generation 进行重建与生成的迭代融合
Output: High-quality 3D reconstruction and novel view synthesis 输出:高质量的三维重建和新视图合成
9.5 [9.5] 2503.24391 Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
[{'name': 'Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen'}]
3D Reconstruction and Modeling 三维重建 v2
4D reconstruction
camera pose estimation
dynamic scenes
Input: Dynamic video footage 动态视频
Step1: Attention map analysis 注意力图分析
Step2: Motion disentanglement 运动解耦
Step3: Point cloud reconstruction 点云重建
Output: Segmented dynamic regions and camera parameters 分割的动态区域和相机参数
9.2 [9.2] 2503.23024 Empowering Large Language Models with 3D Situation Awareness
[{'name': 'Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng, Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li'}]
3D Scene Understanding 3D场景理解 v2
3D situation awareness 3D情境意识
Vision-Language Models 视觉语言模型
Large Language Models 大型语言模型
Input: RGB-D videos RGB-D 视频
Step1: Data collection 数据收集
Step2: Dataset generation 数据集生成
Step3: Situation grounding module integration 情境基础模块集成
Output: Enhanced 3D situational awareness 改进的三维情境感知
9.2 [9.2] 2503.23109 Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
[{'name': 'Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu'}]
Autonomous Driving 自动驾驶 v2
HD map construction
autonomous driving
Input: Images from onboard cameras 车载摄像头图像
Step 1: Feature extraction 特征提取
Step 2: Uncertainty-aware detection 不确定性感知检测
Step 3: Map vectorization 地图向量化
Output: Generalized HD maps 泛化的高清地图
9.2 [9.2] 2503.23313 SpINR: Neural Volumetric Reconstruction for FMCW Radars
[{'name': 'Harshvardhan Takawale, Nirupam Roy'}]
3D Reconstruction and Modeling 三维重建 v2
volumetric reconstruction
neural representation
radar imaging
Input: FMCW radar data 频率调制连续波雷达数据
Step1: Construct frequency-domain model 构建频率域模型
Step2: Integrate neural representations 集成神经表示
Step3: Perform volumetric reconstruction 进行体积重建
Output: High-resolution 3D scenes 高分辨率3D场景
9.2 [9.2] 2503.24229 Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes
[{'name': 'Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D point cloud
instance segmentation
synthetic data
autonomous driving
Input: 3D point cloud data 3D点云数据
Step1: Pre-training with synthetic data 使用合成数据进行预训练
Step2: Instance segmentation model training 实例分割模型训练
Step3: Evaluation of segmentation performance 分割性能评估
Output: Enhanced 3D instance segmentation model 改进的三维实例分割模型
9.0 [9.0] 2503.23337 Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction
[{'name': 'Jingui Ma, Yang Hu, Luyang Tang, Jiayu Yang, Yongqi Zhai, Ronggang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
compression
prediction technique
Input: 3D Gaussian Splatting data 3D高斯点云数据
Step1: Introduce prediction technique 引入预测技术
Step2: Compress using spatial condition 基于空间条件进行压缩
Step3: Model evaluation using residuals 采用残差进行模型评估
Output: Compressed Gaussian representation 压缩的高斯表示
8.5 [8.5] 2503.22976 From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
[{'name': 'Jiahui Zhang, Yurui Chen, Yanpeng Zhou, Yueming Xu, Ze Huang, Jilin Mei, Junhui Chen, Yu-Jie Yuan, Xinyue Cai, Guowei Huang, Xingyue Quan, Hang Xu, Li Zhang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
3D reasoning
spatial perception
Input: 2D spatial data 2D空间数据
Step1: Data generation 数据生成
Step2: Annotation pipeline 注释管道
Step3: Model training 模型训练
Output: Spatial tasks 数据集成和空间任务
8.5 [8.5] 2503.23022 MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs
[{'name': 'Xianglong He, Junyi Chen, Di Huang, Zexiang Liu, Xiaoshui Huang, Wanli Ouyang, Chun Yuan, Yangguang Li'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
mesh generation
efficient algorithms
Input: Raw meshes 原始网格
Step1: Encoding raw meshes into continuous tokens 将原始网格编码为连续的标记
Step2: Decoding tokens back into mesh structure 将标记解码为网格结构
Step3: Generating mesh topology using diffusion model 使用扩散模型生成网格拓扑
Output: High-quality, controllable mesh generation 高质量、可控的网格生成
8.5 [8.5] 2503.23365 OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users
[{'name': 'Zhangcun Yan, Jianqing Li, Peng Hang, Jian Sun'}]
Autonomous Driving 自动驾驶 v2
Vulnerable Road Users
autonomous driving
trajectory dataset
Input: High-resolution trajectory data 高分辨率轨迹数据
Step1: Dataset development 数据集开发
Step2: Data integration 数据集成
Step3: Evaluation of trajectory coverage 轨迹覆盖评估
Output: Comprehensive VRU behavior representation 全面的VRU行为表现
8.5 [8.5] 2503.23368 Towards Physically Plausible Video Generation via VLM Planning
[{'name': 'Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
vision-language models
physics-aware motion planning
Input: Image and text prompt 输入:图像和文本提示
Step1: VLM planning for motion trajectories VLM规划运动轨迹
Step2: Video generation with noise injection 噪声注入的视频生成
Output: Physically plausible videos 输出:物理上合理的视频
8.5 [8.5] 2503.23508 Re-Aligning Language to Visual Objects with an Agentic Workflow
[{'name': 'Yuming Chen, Jiangyan Feng, Haodong Zhang, Lijun Gong, Feng Zhu, Rui Zhao, Qibin Hou, Ming-Ming Cheng, Yibing Song'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
language-based object detection
vision-language models
data alignment
Input: Image with detected objects 带有检测对象的图像
Step1: Generate raw language expressions 生成原始语言表达
Step2: Plan actions based on agent's reasoning 根据代理的推理计划行动
Step3: Adjust image and text prompts 调整图像和文本提示
Output: Re-aligned expressions with improved accuracy 输出: 精确度提高的重新对齐表达
8.5 [8.5] 2503.23573 DASH: Detection and Assessment of Systematic Hallucinations of VLMs
[{'name': 'Maximilian Augustin, Yannic Neuhaus, Matthias Hein'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
object hallucinations
image retrieval
Input: Real-world images 真实世界图像
Step1: Image-based retrieval imagery-based 检索图像
Step2: Clustering similar images 聚类相似图像
Step3: Evaluation of hallucinations 评估幻觉
Output: Identified clusters of hallucinatory images 识别的幻觉图像簇
8.5 [8.5] 2503.23577 Multiview Image-Based Localization
[{'name': 'Cameron Fiore, Hongyi Fan, Benjamin Kimia'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
visual localization
multiview correspondence
autonomous systems
Input: Multiview images 多视角图像
Step1: Image feature extraction 图像特征提取
Step2: Relative translation estimate 计算相对平移
Step3: Optimal pose computation 从多视角相应中计算最优姿态
Output: Accurate localization results 准确定位结果
8.5 [8.5] 2503.23587 PhysPose: Refining 6D Object Poses with Physical Constraints
[{'name': "Martin Malenick\'y, Martin C\'ifka, M\'ed\'eric Fourmy, Louis Montaut, Justin Carpentier, Josef Sivic, Vladimir Petrik"}]
6D Pose Estimation 6D对象姿态估计 v2
6D pose estimation
physical consistency
object-centric scene understanding
robotics
Input: Image and geometric description of the scene 图像和场景几何描述
Step1: Estimate initial poses for objects 估计对象的初始姿态
Step2: Post-processing optimization with physical constraints 引入物理约束的后处理优化
Output: Refined 6D object poses 改进的6D对象姿态
8.5 [8.5] 2503.23606 Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries
[{'name': 'Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo'}]
Depth Estimation 深度估计 v2
depth estimation 深度估计
depth from defocus 失焦深度
photon-limited images 光子限制图像
Input: Defocused images 失焦图像
Step1: Image patch representation 图像块表示
Step2: Neural network prediction 神经网络预测
Step3: Depth calculation using DfD equation 使用DfD方程计算深度
Output: Depth maps 深度图
8.5 [8.5] 2503.23647 Introducing the Short-Time Fourier Kolmogorov Arnold Network: A Dynamic Graph CNN Approach for Tree Species Classification in 3D Point Clouds
[{'name': 'Said Ohamouddoua, Mohamed Ohamouddoub, Rafik Lasrib, Hanaa El Afiaa, Raddouane Chiheba, Abdellatif El Afiaa'}]
3D Reconstruction and Modeling 三维重建 v2
3D Point Cloud
Tree Species Classification
Kolmogorov-Arnold Network
Input: 3D point clouds from TLS and ALS 三维激光扫描点云
Step1: Implement Short-Time Fourier Transform (STFT) 使用短时傅里叶变换
Step2: Develop lightweight Dynamic Graph CNN (liteDGCNN) 开发轻量级动态图卷积神经网络
Step3: Evaluate performance and parameter reduction 评估性能和参数减少
Output: Classified tree species with reduced model complexity 输出:分类树种并降低模型复杂度
8.5 [8.5] 2503.23702 3D Dental Model Segmentation with Geometrical Boundary Preserving
[{'name': 'Shufan Xi, Zexian Liu, Junlin Chang, Hongyu Wu, Xiaogang Wang, Aimin Hao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
tooth segmentation
digital dentistry
Input: 3D intraoral scan mesh 3D口腔扫描网格
Step1: Selective downsampling 方法:选择性下采样
Step2: Boundary feature extraction 边界特征提取
Step3: Model evaluation 模型评估
Output: Improved tooth segmentation 提高的牙齿分割效果
8.5 [8.5] 2503.23980 SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency
[{'name': 'Yanbo Wang, Yongtao Chen, Chuan Cao, Tianchen Deng, Wentao Zhao, Jingchuan Wang, Weidong Chen'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR
zero-shot learning
semi-automatic labeling
Input: Raw LiDAR data 原始激光雷达数据
Step1: Data transformation 数据转换
Step2: Zero-shot learning application 零样本学习应用
Step3: Pre-segmentation generation 预分割生成
Output: High-quality pre-segmented LiDAR data 高质量预分割的激光雷达数据
8.5 [8.5] 2503.24091 4D mmWave Radar in Adverse Environments for Autonomous Driving: A Survey
[{'name': 'Xiangyuan Peng, Miao Tang, Huawei Sun, Lorenzo Servadei, Robert Wille'}]
Autonomous Driving 自动驾驶 v2
4D mmWave radar
autonomous driving
perception
Input: 4D mmWave radar data 4D毫米波雷达数据
Step1: Review existing datasets 现有数据集综述
Step2: Analyze methods for perception and SLAM 感知与SLAM方法分析
Step3: Discuss challenges and future directions 挑战与未来方向讨论
Output: Comprehensive survey report 综合调查报告
8.5 [8.5] 2503.24270 Visual Acoustic Fields
[{'name': 'Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
Visual Acoustic Fields
cross-modal learning
Input: Visual and acoustic data 视觉和声学数据
Step1: Collect multiview images 收集多视角图像
Step2: Record impact sounds 记录撞击声音
Step3: Use structure-from-motion for camera pose estimation 使用运动结构估计摄像机姿态
Step4: Implement 3D Gaussian Splatting for scene reconstruction 使用3D高斯点云重建场景
Step5: Generate sound based on visual cues 使用视觉线索生成声音
Step6: Localize sound sources within the scene 确定场景内声音源的位置
Output: Aligned visual-sound pairs 输出对齐的视觉-声音对
8.5 [8.5] 2503.24381 UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
[{'name': 'Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li'}]
Occupancy Forecasting and Prediction in Autonomous Driving 自动驾驶中的占用预测与预测 v2
Occupancy Forecasting 占用预测
Autonomous Driving 自动驾驶
3D Prediction 三维预测
Input: Multi-view images 多视角图像
Step1: Data unification 数据统一
Step2: Novel metric development 新指标开发
Step3: Algorithm validation 算法验证
Output: Unified occupancy predictions 统一的占用预测
8.0 [8.0] 2503.24129 It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
[{'name': 'Dominik Schnaus, Nikita Araslanov, Daniel Cremers'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Correspondence 视觉-语言对应
Unsupervised Learning 无监督学习
Input: Vision and language embeddings 视觉和语言嵌入
Step1: Formulate matching as a quadratic assignment problem 将匹配公式化为二次分配问题
Step2: Develop a heuristic solver 发展启发式解法
Step3: Conduct extensive empirical study 开展广泛的实证研究
Output: Unsupervised classification outcomes 无监督分类结果
7.5 [7.5] 2503.23105 Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments
[{'name': 'Yifan Xu, Vineet Kamat, Carol Menassa'}]
Autonomous Systems and Robotics 自主系统与机器人 v2
semantic segmentation
robotic navigation
Input: Scene images 场景图像
Step1: Segment different rooms/regions of the scene 划分场景中的不同房间/区域
Step2: Leverage VLM to get similarity scores between descriptions and rooms 利用视觉语言模型获得描述与房间之间的相似度分数
Step3: Use adaptive conformal prediction (ACP) to select rooms according to similarity scores 使用自适应的保形预测根据相似度分数选择房间
Output: Enhanced robot navigation capabilities 提升机器人导航能力
7.5 [7.5] 2503.23200 A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery
[{'name': 'Pengyu Chen, Sicheng Wang, Cuizhen Wang, Senrong Wang, Beiao Huang, Lu Huang, Zhe Zang'}]
Image Generation 图像生成 v2
rooftop detection
historical imagery
Generative Adversarial Networks
image enhancement
Input: Historical aerial imagery 历史航空图像
Step1: Image colorization using DeOldify 图像上色采用DeOldify
Step2: Super-resolution enhancement using Real-ESRGAN 超分辨率增强采用Real-ESRGAN
Step3: Train rooftop detection models 训练屋顶检测模型
Output: Improved rooftop detection performance 改进的屋顶检测性能
7.5 [7.5] 2503.23388 COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation
[{'name': 'Fanding Huang, Jingyan Jiang, Qinting Jiang, Hebei Li, Faisal Nadeem Khan, Zhi Wang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Test-Time Adaptation
Input: Test samples 测试样本
Step1: Cache refinement 缓存改进
Step2: Semantic graph construction 语义图构建
Step3: Hyper-class querying 超类查询
Output: Adapted predictions 适应性预测
7.5 [7.5] 2503.24306 Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge
[{'name': "Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jon\'a\v{s} \v{S}er\'ych, Michal Neoral, Ji\v{r}\'i Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queir\'os, Est\^ev\~ao Lima, Jo\~ao L. Vila\c{c}a, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen, Ling Li, Xiang Ma, Xiaojian Li, Mona Sheikh Zeinoddin, Xu Wang, Zafer Tandogdu, Greg Shaw, Evangelos Mazomenos, Danail Stoyanov, Yuxin Chen, Zijian Wu, Alexander Ladikos, Simon DiMaio, Septimiu E. Salcudean, Omid Mohareri"}]
3D Reconstruction and Modeling 三维重建 v2
point tracking
3D reconstruction
surgery
autonomous probe-based scanning
Input: Point tracking data for surgery 手术点跟踪数据
Step1: Challenge design 挑战设计
Step2: Algorithm submission and evaluation 算法提交与评估
Step3: Performance measurement based on accuracy and efficiency 性能测量基于准确性和效率
Output: Quantitative results for tracking algorithms 跟踪算法的定量结果

Arxiv 2025-03-31

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.21958 NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications
[{'name': 'Kibon Ku, Talukder Z Jubery, Elijah Rodriguez, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy, Baskar Ganapathysubramanian'}]
3D Reconstruction 三维重建 v2
3D reconstruction
NeRF
point cloud
agriculture
Input: Images captured by a stationary camera 静态相机捕获的图像
Step1: COLMAP-based pose estimation COLMAP基础的姿态估计
Step2: Pose transformation to simulate camera movement 姿态转换以模拟相机移动
Step3: NeRF training using captured images 使用捕获的图像进行NeRF训练
Output: High-resolution point clouds 高分辨率点云
9.5 [9.5] 2503.22060 Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges
[{'name': 'Ukcheol Shin, Jinsun Park'}]
Depth Estimation 深度估计 v2
depth estimation
thermal imaging
autonomous driving
multi-modal dataset
robust perception
Input: Synchronized multi-modal data 包含同步的多模态数据
Step1: Dataset construction 数据集构建
Step2: Depth estimation evaluation 深度估计评估
Step3: Benchmark analysis 基准分析
Output: Standardized benchmark results 标准化基准结果
9.5 [9.5] 2503.22087 Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction
[{'name': 'Seokha Moon, Janghyun Baek, Giseop Kim, Jinkyu Kim, Sunwook Choi'}]
3D Occupancy Prediction 3D占用预测 v2
3D occupancy prediction 3D占用预测
autonomous driving 自动驾驶
multi-view images 多视角图像
Input: Multi-view images 多视角图像
Step1: Stream-based Voxel Aggregation 流式体素聚合
Step2: Query-guided Aggregation 查询引导聚合
Step3: Model evaluation 模型评估
Output: 3D occupancy prediction 3D占用预测
9.5 [9.5] 2503.22154 Permutation-Invariant and Orientation-Aware Dataset Distillation for 3D Point Clouds
[{'name': 'Jae-Young Yim, Dongwook Kim, Jae-Young Sim'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D point clouds
dataset distillation
feature alignment
Input: 3D point clouds 3D点云
Step1: Permutation invariant feature matching 排列不变特征匹配
Step2: Orientation optimization 方向优化
Step3: Dataset distillation 数据集蒸馏
Output: Optimized synthetic dataset 优化的合成数据集
9.5 [9.5] 2503.22204 Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting
[{'name': 'Yiren Lu, Yunlai Zhou, Yiran Qiao, Chaoda Song, Tuo Liang, Jing Ma, Yu Yin'}]
3D Reconstruction and Modeling 三维重建 v2
3D segmentation 3D分割
Gaussian Splatting 高斯点云
autonomous systems 自主系统
Input: Multi-view images 多视角图像
Step1: Object-specific Gaussian initialization 面向对象的高斯初始化
Step2: Segmentation via Gaussian Splatting 通过高斯点云分割
Step3: Optimization and scene reconstruction 优化和场景重建
Output: 3D object segmentation output 3D对象分割结果
9.5 [9.5] 2503.22231 CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving
[{'name': 'Yishen Ji, Ziyue Zhu, Zhenxin Zhu, Kaixin Xiong, Ming Lu, Zhiqi Li, Lijun Zhou, Haiyang Sun, Bing Wang, Tong Lu'}]
3D Generation 三维生成 v2
3D generation
autonomous driving
video generation
3D consistency
Input: HD maps and bounding boxes 用于视频生成的HD地图和边界框
Step1: Generate 3D conditions 生成3D条件
Step2: Develop spatially adaptive framework 开发空间自适应框架
Step3: Incorporate consistency adapter 添加一致性适配器
Output: High-quality driving videos 生成高质量的驾驶视频
9.5 [9.5] 2503.22324 AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation
[{'name': 'Chenyang Xu, XingGuo Deng, Rui Zhong'}]
3D Reconstruction 三维重建 v2
3D Gaussian Splatting
3D reconstruction
Novel View Synthesis
Input: Scene representation using 3D Gaussian Splatting 3D高斯点云
Step1: Enhance manifold complexity of input features 加强输入特征的流形复杂性
Step2: Implement Adaptive Frequency Encoding Module (AFEM) 实现自适应频率编码模块
Step3: Apply high-frequency reinforce loss 使用高频强化损失
Output: Improved rendering fidelity and high-frequency detail 改进的渲染保真度和高频细节
9.5 [9.5] 2503.22328 VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
[{'name': 'Yancong Lin, Shiming Wang, Liangliang Nan, Julian Kooij, Holger Caesar'}]
Scene Flow Estimation 场景流估计 v2
scene flow
motion rigidity
autonomous driving
Input: LiDAR scans from autonomous driving applications LiDAR扫描
Step1: Data collection 数据收集
Step2: Implementation of a Voting Module 投票模块的实施
Step3: Scene flow estimation using local rigidity scenes 估计使用局部刚度的场景流
Output: Enhanced motion estimation 改进的运动估计
9.5 [9.5] 2503.22349 GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion
[{'name': 'Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
pose-free reconstruction
camera pose estimation
sparse view
surface reconstruction
Input: Unposed images 未标定图像
Step1: Implement Geometric Consistent Ray Diffusion model (GCRayDiffusion) 实施几何一致的射线扩散模型 (GCRayDiffusion)
Step2: Use triplane-based signed distance field (SDF) for learning 使用三平面签名距离场 (SDF) 进行学习
Step3: Improve camera pose estimation and surface reconstruction through neural rays 改善相机位姿估计和表面重建通过神经射线
Output: Accurate pose-free surface reconstruction results 精确的无位姿表面重建结果
9.5 [9.5] 2503.22430 MVSAnywhere: Zero-Shot Multi-View Stereo
[{'name': 'Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, Jamie Watson'}]
Multi-view Stereo 多视角立体 v2
Multi-View Stereo
Depth Estimation
3D Reconstruction
Zero-Shot Learning
Input: Multiple posed RGB images 多个姿态的RGB图像
Step 1: Depth estimation using transformer architecture 使用变压器架构进行深度估计
Step 2: Cost volume construction using geometric metadata 使用几何元数据构造成本体积
Step 3: Model evaluation and comparison with baselines 模型评估及与基线比较
Output: Accurate and 3D-consistent depth maps 输出:准确且三维一致的深度图
9.5 [9.5] 2503.22436 NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving
[{'name': 'Fuhao Li, Huan Jin, Bin Gao, Liaoyuan Fan, Lihui Jiang, Long Zeng'}]
3D Visual Grounding 视觉定位 v2
multi-view 3D visual grounding
autonomous driving
language grounding
object localization
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Instruction processing 指令处理
Step3: Localization using 3D geometric information 使用3D几何信息进行定位
Output: Localized target objects 本地化目标对象
9.5 [9.5] 2503.22437 EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting
[{'name': 'Xu Wang, Shuai Zhang, Baoru Huang, Danail Stoyanov, Evangelos B. Mazomenos'}]
3D Reconstruction 三维重建 v2
3D Reconstruction
Endoscopic Surgery
Gaussian Splatting
Input: Endoscopic videos 内窥镜视频
Step1: Depth estimation 深度估计
Step2: Model generation 模型生成
Step3: Scene reconstruction 场景重建
Output: Complete 3D surgical scenes 完整的三维手术场景
9.5 [9.5] 2503.22537 LIM: Large Interpolator Model for Dynamic Reconstruction
[{'name': 'Remy Sabathier, Niloy J. Mitra, David Novotny'}]
Dynamic Reconstruction 动态重建 v2
4D reconstruction
implicit 3D representations
mesh tracking
Input: Implicit 3D representations at times t0 and t1
Step1: Interpolation using causal consistency loss
Step2: Mesh tracking across time
Output: High-speed tracked 4D assets
9.5 [9.5] 2503.22676 TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting
[{'name': 'Boyang (Tony), Yu, Yanlin Jin, Ashok Veeraraghavan, Akshat Dave, Guha Balakrishnan'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
object transfer
relighting
scene rendering
Input: Multi-view images containing objects from a source scene with delineating masks.
Step1: Fit a Gaussian Splatting model to both source and target scenes for object extraction and environment mapping.
Step2: Perform 3D object segmentation based on 2D masks to extract precise object geometry.
Step3: User-guided insertion of the extracted object into the target scene with automatic position and orientation refinement.
Step4: Calculate per-Gaussian radiance transfer functions via spherical harmonic analysis to adapt object's appearance for the target scene lighting.
Output: Realistically transferred 3D objects in the target scene.
9.5 [9.5] 2503.22677 DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
[{'name': 'Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi'}]
3D Generation 三维生成 v2
3D reconstruction
physical stability
simulation feedback
Input: 3D object images 物体图像
Step1: Construct stability score dataset 构建稳定性评分数据集
Step2: Fine-tune 3D generator using stability scores 使用稳定性评分微调3D生成器
Step3: Evaluate physical stability 评估物理稳定性
Output: Physically stable 3D objects 物理稳定的3D对象
9.2 [9.2] 2503.22218 ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting
[{'name': 'Wenjie Liu, Zhongliang Liu, Xiaoyan Yang, Man Sha, Yang Li'}]
Neural Rendering 神经渲染 v2
3D style transfer
Neural Rendering
3D Gaussian Splatting
Input: Scene content and style images 场景内容和风格图像
Step1: Controllable matching of images 可控图像匹配
Step2: Feature alignment for style transfer 特征对齐以进行风格转换
Step3: Style transfer with depth preservation 保持深度的风格转换
Output: Stylized 3D scenes 风格化的三维场景
9.2 [9.2] 2503.22351 One Look is Enough: A Novel Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation Models on High-Resolution Images
[{'name': 'Byeongjun Kwon, Munchurl Kim'}]
Depth Estimation 深度估计 v2
monocular depth estimation
high-resolution images
depth discontinuity
Input: High-resolution images 高分辨率图像
Step1: Grouped Patch Consistency Training 组块一致性训练
Step2: Bias Free Masking 去偏见掩码
Step3: Depth refinement on each patch 每个块的深度修正
Output: Accurate depth estimation results 准确的深度估计结果
8.5 [8.5] 2503.21830 Shape Generation via Weight Space Learning
[{'name': 'Maximilian Plattner, Arturs Berzins, Johannes Brandstetter'}]
3D Generation 三维生成 v2
3D shape generation
weight space learning
topology
geometry
phase transition
Input: 3D shape-generative model 3D形状生成模型
Step1: Analyze weight space weight space分析
Step2: Experiment with phase transitions 进行相变实验
Step3: Ensure controlled geometry changes 确保控制几何变化
Output: Enhanced shape generation capabilities 改进的形状生成能力
8.5 [8.5] 2503.22020 CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
[{'name': 'Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Ming-Yu Liu, Donglai Xiang, Gordon Wetzstein, Tsung-Yi Lin'}]
Robotics and Vision-Language Models 机器人和视觉语言模型 v2
vision-language-action models
robot manipulation
visual reasoning
chain-of-thought reasoning
Input: Visual-language-action models 视觉语言动作模型
Step1: Incorporate visual chain-of-thought reasoning 引入视觉思维链推理
Step2: Generate subgoal images 生成子目标图像
Step3: Predict action sequences 预测动作序列
Output: Enhanced robotic control capabilities 增强的机器人控制能力
8.5 [8.5] 2503.22093 How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark
[{'name': 'Ximing Wen, Mallika Mainali, Anik Sen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Theory of Mind
Visual Question Answering
Human Intentions
Multimodal Learning
Input: Visual scenarios and VLMs 输入:视觉场景和视觉语言模型
Step1: Develop open-ended question framework 开发开放式问题框架
Step2: Curate and annotate benchmark dataset 策划和注释基准数据集
Step3: Assess performance of VLMs 评估视觉语言模型的性能
Output: Evaluation results and insights 输出:评估结果和见解
8.5 [8.5] 2503.22194 ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
[{'name': 'Yunhong Min, Daehyeon Choi, Kyeongmin Yeo, Jihyun Lee, Minhyuk Sung'}]
Image Generation 图像生成 v2
3D orientation grounding
text-to-image generation
Input: Text prompts and multi-view objects 文字提示和多视角对象
Step 1: 3D orientation estimation for multiple objects 多个对象的3D方向估计
Step 2: Reward-guided sampling using Langevin dynamics 奖励引导采样使用Langevin动力学
Step 3: Model evaluation and comparison with existing methods 模型评估与现有方法对比
Output: 3D orientated images 3D定向图像
8.5 [8.5] 2503.22201 Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
[{'name': 'Jaewoo Jeong, Seohee Lee, Daehee Park, Giwon Lee, Kuk-Jin Yoon'}]
Autonomous Systems and Robotics 自动驾驶 v2
trajectory forecasting
knowledge distillation
autonomous driving
multi-modal systems
Input: Limited modality student model 受限的模态学生模型
Step1: Train teacher model with full modalities 训练全模态教师模型
Step2: Distill knowledge to student model 从教师模型向学生模型蒸馏知识
Step3: Validate with datasets 验证数据集
Output: Enhanced prediction accuracy 改进的预测精度
8.5 [8.5] 2503.22209 Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces
[{'name': 'Wonhyeok Choi, Kyumin Hwang, Minwoo Choi, Kiljoon Han, Wonjoon Choi, Mingyu Shin, Sunghoon Im'}]
Depth Estimation 深度估计 v2
monocular depth estimation
intrinsic image decomposition
self-supervised learning
Input: Sequential images 序列图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Model training and evaluation 模型训练与评估
Output: Depth prediction and intrinsic images 深度预测与内在图像
8.5 [8.5] 2503.22262 Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
[{'name': 'Songsong Yu, Yuxin Chen, Zhongang Qi, Zeke Xie, Yifan Wang, Lijun Wang, Ying Shan, Huchuan Lu'}]
Multi-view and Stereo Vision 多视角立体 v2
stereo conversion
evaluation metric
3D content production
Input: Monocular images 单眼图像
Step1: Dataset creation 数据集创建
Step2: Empirical evaluation 实证评估
Step3: New metric proposal 新指标提出
Output: Enhanced stereo conversion model 改进的立体转换模型
8.5 [8.5] 2503.22309 A Dataset for Semantic Segmentation in the Presence of Unknowns
[{'name': 'Zakaria Laskar, Tomas Vojir, Matej Grcic, Iaroslav Melekhov, Shankar Gangisettye, Juho Kannala, Jiri Matas, Giorgos Tolias, C. V. Jawahar'}]
Image and Video Generation 图像生成 v2
semantic segmentation
autonomous driving
anomaly detection
Input: Real-world images from diverse environments 真实场景中的图像
Step 1: Dataset creation 数据集创建
Step 2: Labeling with closed-set and anomaly classes 标签闭集和异常类别
Step 3: Controlled evaluation 控制评估
Output: Comprehensive anomaly segmentation dataset 综合异常分割数据集
8.5 [8.5] 2503.22375 Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance
[{'name': 'Christian Steinhauser, Philipp Reis, Hubert Padusinski, Jacob Langner, Eric Sax'}]
Autonomous Driving 自动驾驶 v2
image quality
machine learning
automotive perception
object detection
segmentation
Input: Modified images from automotive datasets 经过修改的汽车数据集中的图像
Step1: Data preparation 数据准备
Step2: Quantification of image deviations 图像偏差的量化
Step3: Performance evaluation of ML models 机器学习模型性能评估
Step4: Correlation analysis of image quality and performance 图像质量与性能的相关性分析
Output: Insights into the impact of image quality on ML performance 输出:图像质量对机器学习性能的影响见解
8.5 [8.5] 2503.22420 Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
[{'name': 'Jiangyong Huang, Baoxiong Jia, Yan Wang, Ziyu Zhu, Xiongkun Linghu, Qing Li, Song-Chun Zhu, Siyuan Huang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
3D vision-language
benchmarking
QA tasks
Input: 3D vision-language models 3D视觉语言模型
Step1: Benchmark evaluation 基准评估
Step2: Object-centric testing 物体中心测试
Step3: Performance analysis 性能分析
Output: Comprehensive metrics for 3D-VL models 3D-VL模型的综合性能指标
8.5 [8.5] 2503.22462 SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations
[{'name': 'Krispin Wandel, Hesheng Wang'}]
3D Object Recognition 物体识别 v2
3D object-class representations
semantic correspondence
Input: RGB images and monocular depth estimates RGB图像和单目深度估计
Step1: Build 3D object-class representations from depth estimates 从深度估计构建3D物体类别表示
Step2: Formulate alignment energy using gradient descent 使用梯度下降公式化对齐能量
Step3: Minimize alignment energy to establish correspondence 最小化对齐能量以建立对应关系
Output: Robust semantic correspondence across varying views 输出:在变化视角下的鲁棒语义对应
8.5 [8.5] 2503.22622 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model
[{'name': 'Jangho Park, Taesung Kwon, Jong Chul Ye'}]
Image and Video Generation 图像生成与视频生成 v2
4D video generation
video diffusion models
spatio-temporal consistency
Input: Single monocular video 单个单目视频
Step1: Synthesize edge frames using video diffusion model 使用视频扩散模型合成边缘帧
Step2: Interpolate remaining frames to construct a coherent sampling grid 插值剩余帧以构建一致的采样网格
Output: Multi-view synchronized 4D video 生成多视角同步4D视频

Arxiv 2025-03-28

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.21082 Can Video Diffusion Model Reconstruct 4D Geometry?
[{'name': 'Jinjie Mai, Wenxuan Zhu, Haozhe Liu, Bing Li, Cheng Zheng, J\"urgen Schmidhuber, Bernard Ghanem'}]
3D Reconstruction and Modeling 三维重建 v2
4D geometry reconstruction 4D几何重建
video diffusion model 视频扩散模型
Input: Monocular video 单目视频
Step1: Adapt a pointmap VAE from a pretrained video VAE 从预训练视频VAE适应一个点图VAE
Step2: Finetune a diffusion backbone in combined video and pointmap latent space 在结合视频和点图潜在空间中微调扩散骨干
Output: Coherent 4D pointmaps 统一的4D点图
9.5 [9.5] 2503.21104 StyledStreets: Multi-style Street Simulator with Spatial and Temporal Consistency
[{'name': 'Yuyin Chen, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Yifei Zhan, Xianpeng Lang'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction 三维重建
autonomous driving 自动驾驶
urban simulation 城市模拟
Input: Street scenes with multi-camera setups 街景与多摄像头设置
Step1: Pose optimization for cameras 摄像头姿态优化
Step2: Hybrid embedding for scene and style separation 场景与风格分离的混合嵌入
Step3: Uncertainty-aware rendering for consistent output 不确定性感知渲染以确保一致性
Output: Photo-realistic urban scenes with invariant geometry 输出: 保持几何不变的照片真实感城市场景
9.5 [9.5] 2503.21214 VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation
[{'name': 'Alan Dao (Gia Tuan Dao), Norapat Buppodom'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D understanding 3D理解
Voxel representation 体素表示
Vision-Language Models 视觉语言模型
Input: 3D voxel grid 3D体素网格
Step1: Decompose into 2D slices 沿主要轴切分成2D切片
Step2: Format and feed into VLM 输入格式化并送入视觉语言模型
Step3: Aggregate and interpret features 聚合并解释特征
Output: Structured voxel semantics 输出结构化的体素语义
9.5 [9.5] 2503.21219 GenFusion: Closing the Loop between Reconstruction and Generation via Videos
[{'name': 'Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
video generation
novel view synthesis
Input: RGB-D videos RGB-D视频
Step1: Fine-tune video model 视频模型微调
Step2: Masked 3D reconstruction 3D重建遮罩处理
Step3: Cyclic fusion pipeline 循环融合流程
Output: Artifact-free 3D models 无伪影的三维模型
9.5 [9.5] 2503.21226 Frequency-Aware Gaussian Splatting Decomposition
[{'name': 'Yishai Lavi, Leo Segre, Shai Avidan'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
frequency decomposition
3D editing
view synthesis
Input: Images for Gaussian Splatting 输入图像用于高斯点云处理
Step1: Group 3D Gaussians based on frequency subbands 按照频率子带分组3D高斯
Step2: Apply dedicated regularization to maintain coherence 应用特殊正则化以保持一致性
Step3: Implement a progressive training scheme for optimization 实施渐进培训方案以优化
Output: Frequency-aware 3D representation with enhanced editing capabilities 输出: 增强编辑能力的频率感知3D表示
9.5 [9.5] 2503.21313 HORT: Monocular Hand-held Objects Reconstruction with Transformers
[{'name': 'Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen, Cordelia Schmid'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
dense point clouds
transformers
Input: Monocular images 单目图像
Step1: Generate sparse point cloud 生成稀疏点云
Step2: Refine to dense representation 精炼到密集表示
Step3: Jointly predict object point cloud and pose 共同预测物体点云和姿态
Output: High-resolution 3D point clouds 输出高分辨率3D点云
9.5 [9.5] 2503.21364 LandMarkSystem Technical Report
[{'name': 'Zhenxiang Ma, Zhenyu Yang, Miao Tao, Yuanzhen Zhou, Zeyu He, Yuchang Zhang, Rong Fu, Hengjie Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Neural Radiance Fields
3D Gaussian Splatting
autonomous driving
Input: Multi-view images 多视角图像
Step1: Componentized model adaptation 组件化模型自适应
Step2: Distributed parallel computing 分布式并行计算
Step3: Dynamic loading strategy 动态加载策略
Output: Enhanced 3D reconstruction and rendering 改进的三维重建与渲染
9.5 [9.5] 2503.21449 Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving
[{'name': 'Lucas Nunes, Rodrigo Marcuzzi, Jens Behley, Cyrill Stachniss'}]
3D Generation 三维生成 v2
3D semantic generation
data annotation
autonomous driving
Input: Semantic scene data 语义场景数据
Step1: Train a diffusion model 训练扩散模型
Step2: Generate realistic 3D semantic scenes 生成真实的3D语义场景
Step3: Evaluate synthetic data for training 评估合成数据的训练效果
Output: Improved semantic segmentation performance 改进的语义分割性能
9.5 [9.5] 2503.21525 ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo
[{'name': 'Yuxi Hu, Jun Zhang, Zhe Zhang, Rafael Weilharter, Yuchen Rao, Kuangyi Chen, Runze Yuan, Friedrich Fraundorfer'}]
Multi-view Stereo 多视角立体 v2
Multi-view Stereo
3D reconstruction
depth estimation
Input: Series of overlapping images 重叠的图像序列
Step1: Feature extraction 特征提取
Step2: Intra-view feature fusion intra-view特征融合
Step3: Cross-view aggregation cross-view聚合
Step4: Depth estimation 深度估计
Output: 3D point cloud 3D点云
9.5 [9.5] 2503.21581 AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion
[{'name': 'Liuyue Xie, Jiancong Guo, Ozan Cakmakci, Andre Araujo, Laszlo A. Jeni, Zhiheng Jia'}]
3D Perception and Calibration 三维感知与标定 v2
camera calibration
3D perception
diffusion model
Input: Video sequences 视频序列
Step1: Condition diffusion model with line embeddings 利用线嵌入条件化扩散模型
Step2: Edge-aware attention focuses on geometric features 边缘关注观点强调几何特征
Step3: Joint estimation of intrinsic and extrinsic parameters 同时估计内外参数
Output: Accurate camera calibration outputs 准确的相机标定输出
9.5 [9.5] 2503.21659 InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
[{'name': 'Kuang Wu, Chuan Yang, Zhanbin Li'}]
3D Reconstruction and Modeling 三维重建 v2
HD maps
autonomous driving
map vectorization
Input: High-definition map data 高精度地图数据
Step1: Enhance detectors using position relation embedding 增强检测器的位置信息嵌入
Step2: Key-frame-based hierarchical temporal fusion 模块 关键帧基础的分层时间融合
Step3: Introduce geometry-aware classification loss 引入几何感知分类损失
Output: Improved vectorized HD map outputs 改进的矢量化高清地图输出
9.5 [9.5] 2503.21692 RapidPoseTriangulation: Multi-view Multi-person Whole-body Human Pose Triangulation in a Millisecond
[{'name': 'Daniel Bermuth, Alexander Poeppel, Wolfgang Reif'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
pose estimation
multi-view
Input: Multi-view images and 2D poses 多视角图像和2D姿势
Step1: Predict 2D poses for each image 预测每幅图像的2D姿势
Step2: Filter pairs of poses using previous 3D poses 使用先前3D姿势筛选姿势对
Step3: Triangulate to create 3D proposals 三角测量生成3D提案
Step4: Reproject and evaluate reprojection error 重新投影并评估重投影误差
Output: Accurate 3D human poses 准确的3D人类姿势
9.5 [9.5] 2503.21732 SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
[{'name': 'Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
mesh modeling
high-resolution shapes
Input: Sparse-structured isosurface representation 稀疏结构的等值面表示
Step1: Frustum-aware sectional voxel training 分段体素训练
Step2: Differentiable mesh reconstruction 可微分网格重建
Step3: Shape modeling pipeline construction 形状建模管道构建
Output: High-resolution 3D models 高分辨率三维模型
9.5 [9.5] 2503.21745 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models
[{'name': 'Yuhan Zhang, Mengchen Zhang, Tong Wu, Tengfei Wang, Gordon Wetzstein, Dahua Lin, Ziwei Liu'}]
3D Generation 三维生成 v2
3D evaluation 3D评估
3D generation 3D生成
human preference 人类偏好
Input: Text and image prompts 文本和图像提示
Step1: Develop 3DGen-Arena platform 开发3DGen-Arena平台
Step2: Gather human preferences 收集人类偏好
Step3: Train scoring models 训练评分模型
Output: 3DGen-Bench dataset 生成3DGen-Bench数据集
9.5 [9.5] 2503.21761 Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
[{'name': 'David Yifan Yao, Albert J. Zhai, Shenlong Wang'}]
3D Reconstruction and Modeling 三维重建与建模 v2
4D modeling
3D reconstruction
dynamic scenes
optimization
Input: Casual video inputs 从普通视频输入
Step1: Multi-stage optimization framework 多阶段优化框架
Step2: Integration of pretrained models 集成预训练模型
Step3: Estimation of camera poses, static and dynamic geometry and motion 相机姿态、静态和动态几何与运动的估计
Output: Accurate 4D scene models 生成准确的4D场景模型
9.5 [9.5] 2503.21766 Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
[{'name': 'Haolin Liu, Xiaohang Zhan, Zizheng Yan, Zhongjin Luo, Yuxin Wen, Xiaoguang Han'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Shape Correspondence
Registration-based Framework
Neural Rendering
Input: Source and target mesh 源网格和目标网格
Step1: Registration of source mesh to target mesh 源网格注册到目标网格
Step2: Establish dense correspondence between shapes 建立形状间的稠密对应关系
Step3: Apply Semantic Flow Guided Registration application 使用语义流引导注册
Output: Stable dense correspondence output 稳定的稠密对应输出
9.5 [9.5] 2503.21767 Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying
[{'name': 'Hairong Yin, Huangying Zhan, Yi Xu, Raymond A. Yeh'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
open-vocabulary querying
point-level querying
Input: 3D Gaussian representation 3D高斯表示
Step1: Utilize masklets for ground-truth generation 利用masklet生成基准真相
Step2: Implement a two-step querying process 实现两步查询过程
Output: Retrieved relevant 3D Gaussians 相关3D高斯的提取
9.5 [9.5] 2503.21778 HS-SLAM: Hybrid Representation with Structural Supervision for Improved Dense SLAM
[{'name': 'Ziren Gong, Fabio Tosi, Youmin Zhang, Stefano Mattoccia, Matteo Poggi'}]
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 v2
Dense SLAM
3D reconstruction
Structural Supervision
Input: RGB-D data with potential structure scenes RGB-D 数据与潜在结构场景
Step1: Hybrid encoding network to enhance scene representation 集成编码网络以增强场景表示
Step2: Structural supervision for scene understanding 结构监督以理解场景
Step3: Active global bundle adjustment for consistency 激活全局束调整以确保一致性
Output: Accurate dense maps with improved tracking and reconstruction 准确的密集地图及改进的跟踪与重建
9.2 [9.2] 2503.21751 Reconstructing Humans with a Biomechanically Accurate Skeleton
[{'name': 'Yan Xia, Xiaowei Zhou, Etienne Vouga, Qixing Huang, Georgios Pavlakos'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
biomechanical skeleton
transformer
Input: Single image 单幅图像
Step1: Generate pseudo ground truth 生成伪真实数据
Step2: Train transformer to estimate parameters 训练变换器以估计参数
Step3: Iterative refinement of pseudo labels 伪标签的迭代优化
Output: 3D human reconstruction 3D人体重建
8.5 [8.5] 2503.20936 LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos
[{'name': 'Daniel Etaat, Dvij Kalaria, Nima Rahmanian, Shankar Sastry'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
anticipatory control
table tennis robotics
Input: Monocular videos 单目视频
Step1: Data integration 数据集成
Step2: 3D reconstruction 3D重建
Step3: Anticipatory control algorithm 开发预测控制算法
Output: Enhanced ball return rate 改进的击球回报率
8.5 [8.5] 2503.21099 Learning Class Prototypes for Unified Sparse Supervised 3D Object Detection
[{'name': 'Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, Jian Yang'}]
3D Object Detection 3D目标检测 v2
3D object detection
sparse supervision
prototypes
indoor and outdoor scenes
Input: Sparse supervised 3D object detection data 稀疏监督3D目标检测数据
Step1: Prototype-based object mining module 原型基础的对象挖掘模块
Step2: Optimal transport matching 最优传输匹配
Step3: Multi-label cooperative refinement module 多标签协同精练模块
Output: Enhanced detection performance 改进的检测性能
8.5 [8.5] 2503.21268 ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate
[{'name': 'Ming Yan, Xincheng Lin, Yuhua Luo, Shuqi Fan, Yudi Dai, Qixin Zhong, Lincai Zhong, Yuexin Ma, Lan Xu, Chenglu Wen, Siqi Shen, Cheng Wang'}]
Human Motion Recovery 人体运动恢复 v2
3D reconstruction
human motion recovery
autonomous driving
Input: RGB and LiDAR data
Step1: Collecting and annotating climbing motion data
Step2: Developing ClimbingCap method for motion reconstruction
Step3: Evaluating performance on climbing motion recovery
Output: Continuous 3D human climbing motion in global coordinates
8.5 [8.5] 2503.21338 UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation
[{'name': 'Yehui Shen, Lei Zhang, Qingqiu Li, Xiongwei Zhao, Yue Wang, Huimin Lu, Xieyuanli Chen'}]
Visual Place Recognition 视觉地点识别 v2
Visual Place Recognition
NeRF
Data Augmentation
Autonomous Navigation
3D reconstruction
Input: Existing VPR dataset 现有VPR数据集
Step1: Train NeRF using existing VPR data 使用现有VPR数据训练NeRF
Step2: Identify high uncertainty places using uncertainty estimation network 使用不确定性估计网络识别高不确定性的位置
Step3: Generate synthetic observations with selected poses through NeRF 通过NeRF生成选定姿态的合成观测
Output: Enhanced VPR training data 改进的VPR训练数据
8.5 [8.5] 2503.21477 Fine-Grained Behavior and Lane Constraints Guided Trajectory Prediction Method
[{'name': 'Wenyi Xiong, Jian Chen, Ziheng Qi'}]
Autonomous Systems and Robotics 自主系统与机器人 trajectory prediction 轨迹预测
autonomous driving 自动驾驶
lane constraints 车道约束
Input: Trajectory data 轨迹数据
Step1: Behavioral intention recognition 行为意图识别
Step2: Lane constraint modeling 车道约束建模
Step3: Dual-stream architecture integration 双流架构集成
Step4: Trajectory proposal generation 轨迹提议生成
Step5: Point-level refinement 点级细化
Output: Fine-grained trajectory predictions 精细化轨迹预测
8.5 [8.5] 2503.21562 uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images
[{'name': 'Jonathan Lee, Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Fu-En Wang, Yi-Hsuan Tsai, Min Sun'}]
3D Reconstruction and Modeling 三维重建 v2
room layout estimation
3D reconstruction
panoramic images
Input: Perspective and panoramic images 透视和全景图像
Step1: Project input images into equirectangular coordinates 将输入图像投影到等经纬度坐标
Step2: Use shared feature extractor with domain-specific conditioning 使用共享特征提取器并进行领域特定条件处理
Step3: Apply column-wise feature regression 应用列-wise 特征回归
Output: Estimated room layout geometries 估计的房间布局几何
8.5 [8.5] 2503.21723 OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation
[{'name': 'Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan'}]
3D Reconstruction and Modeling 三维重建 v2
3D hand pose estimation
occlusion
autonomous systems
CNN
transformer
Input: RGB image RGB图像
Step1: Localizing hand joints using CNN 定位手关节采用CNN
Step2: Refining joint estimates using contextual information 使用上下文信息细化关节估计
Step3: Identifying joints with self-attention and cross-attention mechanisms 使用自注意力和交叉注意力机制识别关节
Output: Accurate 3D hand-object pose estimates 输出: 精确的3D手-物体姿态估计
8.5 [8.5] 2503.21755 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
[{'name': 'Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, Ziwei Liu'}]
Image and Video Generation 图像生成和视频生成 v2
Video Generation 视频生成
Intrinsic Faithfulness 内在真实
Input: Video generative models 视频生成模型
Step1: Establish evaluation metrics 建立评估指标
Step2: Benchmark development 基准开发
Step3: Model assessment 模型评估
Output: Intrinsically faithful video generation outputs 本质真实的视频生成结果
8.5 [8.5] 2503.21779 X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
[{'name': 'Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan'}]
3D Reconstruction and Modeling 三维重建 v2
4D CT reconstruction
Gaussian splatting
dynamic imaging
respiratory motion learning
Input: Projections of dynamic anatomical structures 3D动态解剖结构的投影
Step1: Model continuous anatomical motion 建模连续解剖运动
Step2: Apply radiative Gaussian splatting 应用辐射高斯点云
Step3: Implement self-supervised learning 实现自监督学习
Output: 4D CT reconstruction of continuous motion 输出:连续运动的4D CT重建
7.5 [7.5] 2503.21483 BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
[{'name': 'Shuming Liu, Chen Zhao, Tianqi Xu, Bernard Ghanem'}]
VLM & VLA 视觉语言模型与对齐 v2
Video-Language Models
frame selection
video understanding
Input: Long-form videos 长视频
Step1: Frame selection strategy evaluation 帧选择策略评估
Step2: Implementation of inverse transform sampling 逆变换采样的实现
Step3: Performance assessment on video benchmarks 视频基准上的性能评估
Output: Improved video understanding performance 提升的视频理解性能

Arxiv 2025-03-27

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.20168 EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
[{'name': 'Sheng Miao, Jiaxin Huang, Dongfeng Bai, Xu Yan, Hongyu Zhou, Yue Wang, Bingbing Liu, Andreas Geiger, Yiyi Liao'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
autonomous driving
real-time rendering
Input: Multiple sparse images 多张稀疏图像
Step 1: Initialize noisy depth predictions 初始化噪声深度预测
Step 2: Process point cloud with 3D CNN 使用3D卷积神经网络处理点云
Step 3: Predict 3D Gaussian properties 预测3D高斯属性
Output: Real-time rendering of urban scenes 实时渲染城市场景
9.5 [9.5] 2503.20211 Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
[{'name': 'Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan'}]
Depth Estimation 深度估计 v2
Depth Estimation 深度估计
Autonomous Driving 自动驾驶
Robustness 可靠性
Input: Monocular images 单目图像
Step1: Synthetic adaptation with motion structure knowledge 合成适应与运动结构知识
Step2: Real adaptation with consistency-reweighting strategy 实际适应与一致性加权策略
Step3: Depth estimation model training 深度估计模型训练
Output: Robust depth predictions 可靠的深度预测
9.5 [9.5] 2503.20220 DINeMo: Learning Neural Mesh Models with no 3D Annotations
[{'name': 'Weijie Guo, Guofeng Zhang, Wufei Ma, Alan Yuille'}]
3D Reconstruction and Modeling 三维重建 v2
3D pose estimation
neural mesh models
unlabeled data
autonomous systems
robotics
Input: Images of objects without 3D annotations 无3D标注的物体图像
Step1: Generate pseudo-correspondence 生成伪对应关系
Step2: Train neural mesh model using pseudo labels 使用伪标签训练神经网格模型
Step3: Evaluate performance on 3D pose estimation 在3D姿态估计上评估性能
Output: Accurate 3D pose estimates 准确的3D姿态估计
9.5 [9.5] 2503.20221 TC-GS: Tri-plane based compression for 3D Gaussian Splatting
[{'name': 'Taorui Wang, Zitong Yu, Yong Xu'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
Compression
Tri-plane
Input: Unorganized 3D Gaussian attributes 非结构化的3D高斯属性
Step1: Tri-plane encoding of attributes 三平面编码属性
Step2: KNN-based decoding for Gaussian distribution KNN解码高斯分布
Step3: Adaptive wavelet loss for high-frequency details 自适应小波损失处理高频细节
Output: Compressed 3D Gaussian representation 压缩后的3D高斯表示
9.5 [9.5] 2503.20519 MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
[{'name': 'Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee'}]
3D Generation 三维生成 v2
3D generation
masked auto-regressive transformer
Input: 3D data 3D数据
Step1: Pyramid VAE architecture development Pyramid VAE架构开发
Step2: Cascaded MAR generation implementation 级联MAR生成实现
Step3: Training with random masking and auto-regressive denoising 随机掩蔽和自回归去噪训练
Output: High-resolution 3D meshes 高分辨率3D网格
9.5 [9.5] 2503.20523 GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving
[{'name': 'Lloyd Russell, Anthony Hu, Lorenzo Bertoni, George Fedoseev, Jamie Shotton, Elahe Arani, Gianluca Corrado'}]
Generative Models for Autonomous Driving 自动驾驶的生成模型 v2
3D modeling
autonomous driving
scene simulation
generative models
Input: Structured conditioning parameters 结构化条件参数
Step1: Multi-camera video generation 多摄像头视频生成
Step2: Conditioning on driving scenarios 驾驶场景条件化
Step3: Fine-grained control over agent behavior 代理行为的细粒度控制
Output: High-resolution, temporally consistent videos 高分辨率、时间一致性的视频
9.5 [9.5] 2503.20784 FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks
[{'name': 'Jinwei Li, Huan-ang Gao, Wenyi Li, Haohan Chi, Chenyu Liu, Chenxi Du, Yiqian Liu, Mingju Gao, Guiyu Zhang, Zongzheng Zhang, Li Yi, Yao Yao, Jingwei Zhao, Hongyang Li, Yikai Wang, Hao Zhao'}]
3D Generation 三维生成 v2
4D generation
dynamic content generation
feature bank
spatial-temporal consistency
multi-view generation
Input: Multi-view and frame sequences 多视角和帧序列
Step1: Feature extraction 特征提取
Step2: Feature bank integration 特征库集成
Step3: Temporal generation algorithm generation 时间生成算法生成
Step4: Model evaluation 模型评估
Output: Coherent dynamic 3D content 连贯的动态3D内容
9.2 [9.2] 2503.19947 Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
[{'name': 'Paul Koch, J\"org Kr\"uger, Ankit Chowdhury, Oliver Heimann'}]
Depth Estimation 深度估计 v2
depth understanding
vision-guided robotics
self-supervised learning
Input: RGB encoders with depth information RGB编码器与深度信息
Step1: Self-supervised training pipeline 自监督训练管道
Step2: Depth feature extraction 深度特征提取
Step3: Performance evaluation 性能评估
Output: Enhanced RGBD encoder 改进的RGBD编码器
9.0 [9.0] 2503.20654 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports
[{'name': 'Xiangwen Zhang, Qian Zhang, Longfei Han, Qiang Qu, Xiaoming Chen'}]
Autonomous Driving 自动驾驶 v2
3D reconstruction
autonomous driving
vehicle collision
Input: Real-world accident reports 从真实事故报告中获取信息
Step1: Extract physical clues from reports 从报告中提取物理线索
Step2: Use physical simulator to replicate trajectories 使用物理模拟器生成碰撞轨迹
Step3: Fine-tune language model for scenario predictions 细调语言模型以预测场景
Output: Physically realistic vehicle collision videos 生成物理真实感的车辆碰撞视频
8.5 [8.5] 2503.19953 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
[{'name': 'Stefan Stojanov, David Wendt, Seungwoo Kim, Rahul Venkatesh, Kevin Feigelis, Jiajun Wu, Daniel LK Yamins'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人技术 v2
motion estimation
self-supervised learning
optical flow
Input: Video data 视频数据
Step1: Flow and occlusion estimation 流动和遮挡估计
Step2: Optimize counterfactual probes 优化反事实探针
Step3: Model evaluation 模型评估
Output: Motion estimates 运动估计
8.5 [8.5] 2503.20011 Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception
[{'name': 'Luke Chen, Junyao Wang, Trier Mortlock, Pramod Khargonekar, Mohammad Abdullah Al Faruque'}]
Autonomous Systems and Robotics 自动驾驶和机器人 v2
Uncertainty Quantification
Autonomous Vehicles
Multimodal Fusion
3D Object Detection
Input: Multimodal sensor inputs 多模态传感器输入
Step1: Feature extraction 特征提取
Step2: Uncertainty quantification 不确定性量化
Step3: Feature fusion 特征融合
Output: Enhanced perception and detection 改进的感知与检测
8.5 [8.5] 2503.20235 Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection
[{'name': 'Ahyun Seo, Minsu Cho'}]
3D Reconstruction and Modeling 三维重建 v2
3D symmetry detection 3D对称性检测
geometric priors 几何先验
Input: 2D images 2D图像
Step1: Predict rotation centers in 3D space 在3D空间中预测旋转中心
Step2: Vertex reconstruction enforcing 3D geometric priors 强制执行3D几何先验的顶点重建
Step3: Project results back to 2D 将结果投影回2D
Output: Detected rotation symmetry with enhanced accuracy 检测到的旋转对称性,具有更高的准确性
8.5 [8.5] 2503.20268 EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation
[{'name': 'Ziran Zhang, Xiaohui Li, Yihao Liu, Yujin Wang, Yueting Chen, Tianfan Xue, Shi Guo'}]
Image and Video Generation 图像生成与视频生成 v2
video frame interpolation
event cameras
diffusion models
Input: Low-frame-rate RGB frames and event signals
Step1: Develop Multi-Modal Motion Condition Generator (MMCG) to integrate motion clues
Step2: Fine-tune stable video diffusion (SVD) model with conditions from MMCG
Step3: Evaluate generated frames for visual quality and fidelity
Output: Physically realistic intermediate video frames
8.5 [8.5] 2503.20291 CryoSAMU: Enhancing 3D Cryo-EM Density Maps of Protein Structures at Intermediate Resolution with Structure-Aware Multimodal U-Nets
[{'name': 'Chenwei Zhang, Anne Condon, Khanh Dao Duc'}]
3D Reconstruction and Modeling 三维重建 v2
3D cryo-EM
protein structure
deep learning
Input: 3D cryo-EM density maps 3D 冷冻电子显微镜密度图
Step1: Integrate structural information with map features 集成结构信息与图像特征
Step2: Train multimodal U-Net on curated datasets 训练多模态U-Net模型
Step3: Evaluate performance across various metrics 评估各类指标下的性能
Output: Enhanced cryo-EM maps 改进的冷冻电子显微镜图像
8.5 [8.5] 2503.20321 Recovering Dynamic 3D Sketches from Videos
[{'name': 'Jaeah Lee, Changwoon Choi, Young Min Kim, Jaesik Park'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
dynamic sketches
motion analysis
video-based 3D reconstruction
Input: Video frames 视频帧
Step1: Extract 3D point cloud motion guidance 提取3D点云运动引导
Step2: Deform parametric 3D curves 变形参数化的3D曲线
Step3: Optimize motion guidance 优化运动引导
Output: Compact dynamic 3D sketches 输出紧凑的动态3D草图
8.5 [8.5] 2503.20652 Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification
[{'name': 'Theo Di Piazza, Carole Lazarus, Olivier Nempont, Loic Boussel'}]
3D Reconstruction and Modeling 三维重建 v2
3D CT scans
anomaly classification
global-local attention
Input: 3D CT volumes 三维CT体积
Step1: Emulate scrolling behavior 模拟滚动行为
Step2: Global-local attention model development 全球-local注意力模型开发
Step3: Model evaluation on datasets 在数据集上评估模型
Output: Multi-label anomaly classification results 多标签异常分类结果
8.5 [8.5] 2503.20663 ARMO: Autoregressive Rigging for Multi-Category Objects
[{'name': 'Mingze Sun, Shiwei Mao, Keyi Chen, Yurun Chen, Shunlin Lu, Jingbo Wang, Junting Dong, Ruqi Huang'}]
3D Reconstruction and Modeling 三维重建 v2
3D modeling 三维建模
rigging 装配
autoregressive models 自回归模型
Input: 3D meshes 三维网格
Step1: Data integration 数据集成
Step2: Autoregressive model development 自回归模型开发
Step3: Skeleton prediction 骨骼预测
Output: Rigged 3D models 装配的三维模型
8.5 [8.5] 2503.20682 GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
[{'name': 'Xingyu Peng, Si Liu, Chen Gao, Yan Bai, Beipeng Mu, Xiaofei Wang, Huaxia Xia'}]
3D Open-Vocabulary Detection 3D开集检测 v2
3D Open-Vocabulary Detection
LiDAR
point clouds
Input: LiDAR point clouds LiDAR点云
Step1: Generate initial detection results 生成初步检测结果
Step2: Analyze scene context 分析场景上下文
Step3: Refine detection using common sense reasoning 精确利用常识推理修正检测结果
Step4: Apply balance schemes to improve class representation 应用平衡机制以改善类别表示
Output: Improved detection results with topic adaptability 输出: 改进的具有适应性的检测结果
8.5 [8.5] 2503.20746 PhysGen3D: Crafting a Miniature Interactive World from a Single Image
[{'name': 'Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, Shenlong Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
interactive simulation
video generation
Input: Single image 单一图像
Step1: Estimate 3D shapes 估计三维形状
Step2: Compute physical and lighting properties 计算物理和光照属性
Step3: Generate interactive 3D scene 生成互动的三维场景
Output: Realistic video generation 真实视频生成
8.5 [8.5] 2503.20776 Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
[{'name': 'Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi'}]
4D Representation and Reconstruction 4D 表示与重建 v2
4D representation 4D表示
Gaussian Splatting 高斯点云
Monocular Video 单目视频
Input: Monocular video 单目视频
Step 1: Dynamic optimization 动态优化
Step 2: Gaussian feature field distillation 高斯特征场蒸馏
Step 3: 4D scene reconstruction 4D场景重建
Output: Interactive 4D agentic AI 交互式4D智能AI
7.5 [7.5] 2503.20314 Wan: Open and Advanced Large-Scale Video Generative Models
[{'name': 'WanTeam, :, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fang, Tianxing Wang, Tianyi Gui, Tingyu Weng, Tong Shen, Wei Lin, Wei Wang, Wei Wang, Wenmeng Zhou, Wente Wang, Wenting Shen, Wenyuan Yu, Xianzhong Shi, Xiaoming Huang, Xin Xu, Yan Kou, Yangyu Lv, Yifei Li, Yijing Liu, Yiming Wang, Yingya Zhang, Yitong Huang, Yong Li, You Wu, Yu Liu, Yulin Pan, Yun Zheng, Yuntao Hong, Yupeng Shi, Yutong Feng, Zeyinzi Jiang, Zhen Han, Zhi-Fan Wu, Ziyu Liu'}]
Image and Video Generation 图像与视频生成 v2
video generation
generative models
diffusion models
Input: Large-scale images and videos 大规模图像和视频
Step1: Data curation 数据整理
Step2: Model design and optimization 模型设计与优化
Step3: Benchmarking and evaluation 基准测试与评估
Output: Advanced video generative models 高级视频生成模型

Arxiv 2025-03-26

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.19332 Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussian Spatting
[{'name': 'Zhiying Yan, Yiyuan Liang, Shilv Cai, Tao Zhang, Sheng Zhong, Luxin Yan, Xu Zou'}]
3D Reconstruction and Modeling 三维重建 v2
Dynamic Scene Reconstruction
Gaussian Splatting
Input: Dynamic scenes 动态场景
Step1: Data separation 数据分离
Step2: Hierarchical optimization 分层优化
Step3: Gaussian management 高斯管理
Output: Enhanced dynamic scene understanding 改进的动态场景理解
9.5 [9.5] 2503.19340 BADGR: Bundle Adjustment Diffusion Conditioned by GRadients for Wide-Baseline Floor Plan Reconstruction
[{'name': 'Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu, Linda Shapiro, Alex Colburn'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
bundle adjustment
RGB panorama
layout generation
Input: Wide-baseline RGB panoramas 宽基线RGB全景图
Step1: Camera pose and floor plan initialization 相机位姿和平面布局初始化
Step2: Bundle adjustment and refinement 捆绑调整与优化
Step3: Integration of layout-structural constraints 布局结构约束的整合
Output: Accurate camera poses and floor plans 准确的相机位姿和楼层平面图
9.5 [9.5] 2503.19373 DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
[{'name': 'Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
human body reconstruction
cloth modeling
Input: Single image 单幅图像
Step1: Utilize 3D template models for regularization 利用三维模板模型进行正则化
Step2: Develop a specialized cloth diffusion model 开发专门的布料扩散模型
Step3: Reconstruct 3D cloth and human body based on templates 基于模板重建三维布料和人体
Output: Decomposed 3D model of cloth and human body 输出:分解的三维布料和人体模型
9.5 [9.5] 2503.19443 COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
[{'name': 'Jiaxin Zhang, Junjun Jiang, Youyu Chen, Kui Jiang, Xianming Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D segmentation
Gaussian splatting
visual quality
scene understanding
object boundaries
Input: Multi-view images 多视角图像
Step1: Joint optimization of semantics and visual information 联合优化语义与视觉信息
Step2: Boundary-adaptive Gaussian splitting technique 边界自适应高斯分裂技术
Step3: Texture restoration for visual quality texture restoration 视觉质量的纹理恢复
Output: Improved segmentation accuracy and clear boundaries 改进的分割精度和清晰边界
9.5 [9.5] 2503.19448 Towards Robust Time-of-Flight Depth Denoising with Confidence-Aware Diffusion Model
[{'name': 'Changyong He, Jin Zeng, Jiawei Zhang, Jiajie Guo'}]
Depth Estimation 深度估计 v2
Depth Denoising
Time-of-Flight
Diffusion Models
3D Reconstruction
Input: Raw correlation measurements from ToF sensors 从时间飞行传感器的原始相关测量开始
Step1: Dynamic range normalization 动态范围归一化
Step2: Apply diffusion model in denoising 应用扩散模型进行去噪
Step3: Confidence-aware guidance integration 集成基于置信度的指导
Output: Enhanced depth maps 改进的深度图
9.5 [9.5] 2503.19452 SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors
[{'name': 'Yiqing Li, Xuan Wang, Jiawei Wu, Yikun Ma, Zhi Jin'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
few-shot learning
novel view synthesis
occlusion handling
Input: Unconstrained in-the-wild images from various views 来自不同视角的野外图像
Step1: Multi-view stereo for camera parameters multi-view立体视觉技术获取相机参数
Step2: Gaussian optimization with Constrained Novel-View Enhancement 高斯优化与约束新视角增强模块结合
Step3: Occlusion handling to improve view consistency 处理遮挡以提高视角一致性
Output: High-quality novel views of the scene 该场景的高质量新视角
9.5 [9.5] 2503.19458 GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting
[{'name': 'Shujuan Li, Yu-Shen Liu, Zhizhong Han'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
unsigned distance functions
multi-view images
3D Gaussian Splatting
Input: Multi-view images 多视角图像
Step1: Overfit 2D Gaussian planes on surfaces 在表面上过拟合2D高斯平面
Step2: Use self-supervision and gradient-based inference for UDF supervision 利用自监督和基于梯度的推理进行UDF监督
Step3: Produce continuous UDF representations 生成连续的UDF表示
Output: Accurate reconstruction of open surfaces 精确重建开放表面
9.5 [9.5] 2503.19543 Scene-agnostic Pose Regression for Visual Localization
[{'name': 'Junwei Zheng, Ruiping Liu, Yufan Chen, Zhenfang Chen, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen'}]
Visual Odometry 视觉里程计 v2
Pose Regression 姿态回归
Visual Localization 视觉定位
Camera Poses 相机姿态
Input: Sequence of images along a trajectory 图像序列沿着轨迹
Step1: Model input preparation 模型输入准备
Step2: Pose prediction 相机姿态预测
Step3: Evaluation of pose accuracy 姿态精度评估
Output: Predictions of 6D camera poses 6D相机姿态预测
9.5 [9.5] 2503.19703 High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting
[{'name': 'Qian Wang, Zhihao Zhan, Jialei He, Zhituo Tu, Xiang Zhu, Jie Yuan'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
digital orthophoto maps
2D Gaussian Splatting
depth estimation
Input: Multi-view images and terrain data 多视角图像及地形数据
Step1: Generate depth maps 生成深度图
Step2: Apply 2D Gaussian Splatting method 应用2D高斯点云方法
Step3: Render True Digital Orthophoto Maps (TDOMs) 渲染真正数字正交影像图(TDOMs)
Output: High-quality spatial reconstruction 高质量空间重建
9.5 [9.5] 2503.19776 Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion
[{'name': 'Konyul Park, Yecheol Kim, Daehun Kim, Jun Won Choi'}]
Autonomous Systems and Robotics 自动驾驶与机器人 v2
LiDAR
camera
sensor fusion
3D object detection
autonomous driving
Input: Multi-modal sensor data 多模态传感器数据
Step1: Integration of LiDAR and camera features LiDAR与相机特征的集成
Step2: Development of Multi-Expert Decoding framework 多专家解码框架的开发
Step3: Performance evaluation on benchmark 数据集上的性能评估
Output: Robust 3D object detection results 稳健的三维物体检测结果
9.5 [9.5] 2503.19912 SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining
[{'name': 'Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu'}]
3D Reconstruction and Modeling 三维重建 v2
LiDAR representation learning
autonomous driving
3D perception
spatiotemporal consistency
Input: Consecutive LiDAR-camera pairs LiDAR-相机配对
Step1: View consistency alignment view 一致性对齐
Step2: Dense-to-sparse consistency regularization 密集到稀疏一致性正则化
Step3: Flow-based contrastive learning 基于流的对比学习
Step4: Temporal voting strategy 时间投票策略
Output: Enhanced LiDAR-based perception 改进的基于LiDAR的感知
9.5 [9.5] 2503.19913 PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
[{'name': 'Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao'}]
3D Reconstruction and Modeling 三维重建与建模 v2
4D reconstruction 四维重建
part-level dynamics 部分级动态
autonomous robotics 自主机器人
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: 4D reconstruction framework development 四维重建框架开发
Step3: Motion and appearance learning 动作与外观学习
Output: Enhanced representations of part-level dynamics 改进的部分动态表示
9.5 [9.5] 2503.19914 Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
[{'name': 'Sangwon Beak, Hyeonwoo Kim, Hanbyul Joo'}]
3D Reconstruction and Modeling 三维重建 v2
3D spatial relationships
object-object relationships
diffusion models
synthetic 3D samples
Input: Synthesized 2D images 从合成的 2D 图像获取数据
Step1: Generate 3D samples from 2D images 从 2D 图像生成 3D 样本
Step2: Train score-based OOR diffusion model 训练基于分数的 OOR 扩散模型
Step3: Extend to multi-object OOR 扩展到多对象 OOR
Output: Distributions of spatial relationships 输出空间关系的分布
9.2 [9.2] 2503.19011 RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
[{'name': 'Yifei Feng, Mingxin Yang, Shuhui Yang, Sheng Zhang, Jiaao Yu, Zibo Zhao, Yuhong Liu, Jie Jiang, Chunchao Guo'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D-aware texture generation 3D感知纹理生成
multi-view consistency 多视角一致性
texture synthesis 纹理合成
Input: 3D geometries and multi-view images 3D几何体和多视角图像
Step1: Integrate multi-view image information 整合多视角图像信息
Step2: Develop a multi-attention texture synthesis network 开发多注意力纹理合成网络
Step3: Apply geometry-related Classifier-Free Guidance (CFG) 应用与几何相关的无分类器引导 (CFG)
Output: High-quality and consistent texture maps 输出: 高质量且一致的纹理图
9.0 [9.0] 2503.19207 FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
[{'name': 'Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
human avatars
animation
feedforward
multi-frame aggregation
Input: Casual phone photos 从手机照片获取输入
Step1: 3D canonicalization 进行三维规范化
Step2: Multi-frame feature aggregation 多帧特征聚合
Step3: Avatar shape and animation inference 推断头像形状和动画
Output: Personalized 3D avatar 生成个性化的三维头像
8.5 [8.5] 2503.19157 HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
[{'name': 'Mingzhen Huang, Fu-Jen Chu, Bugra Tekin, Kevin J Liang, Haoyu Ma, Weiyao Wang, Xingyu Chen, Pierre Gleize, Hongfei Xue, Siwei Lyu, Kris Kitani, Matt Feiszli, Hao Tang'}]
3D Reconstruction and Modeling 三维重建 v2
3D hand-object interaction 3D手-物体交互
language models 语言模型
Input: Text prompts or partial HOI sequences 文本提示或部分HOI序列
Step1: HOI sequence tokenization HOI序列的标记化
Step2: Bidirectional transformation between HOI sequences and text HOI序列与文本间的双向变换
Step3: HOI generation or completion HOI生成或补全
Output: Generated 3D hand-object interaction sequences 生成的3D手-物体交互序列
8.5 [8.5] 2503.19199 Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces
[{'name': 'Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene graphs
functional relationships
RGB-D images
Input: Posed RGB-D images 从RGB-D图像输入
Step1: Predicting objects and interactive elements 预测物体和交互元素
Step2: Inferring functional relationships 推断功能关系
Output: Functional 3D scene graph 功能3D场景图
8.5 [8.5] 2503.19276 Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications
[{'name': 'Ben Rahman'}]
Semantic Segmentation 语义分割 v2
Semantic Segmentation
Large Language Models
Autonomous Driving
Context-Aware Systems
Input: Images with complex scenes 复杂场景中的图像
Step1: Integrate visual features and language embeddings 整合视觉特征和语言嵌入
Step2: Implement a Cross-Attention Mechanism 实现跨注意力机制
Step3: Utilize Graph Neural Networks for object relationships 使用图神经网络处理对象间的关系
Output: Enhanced pixel-level and contextual understanding 改进的像素级和上下文理解
8.5 [8.5] 2503.19307 Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
[{'name': 'Zhuoran Zhao, Linlin Yang, Pengzhan Sun, Pan Hui, Angela Yao'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
hand pose estimation
synthetic data
Input: Synthetic and real data合成和真实数据
Step1: Synthetic data analysis分析合成数据
Step2: Gap analysis分析gap
Step3: Data synthesis pipeline proposal提出数据合成流程
Output: Enhanced hand pose estimation改进的手部姿态估计
8.5 [8.5] 2503.19308 A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation
[{'name': 'Chaohan Wang, Yutong Xie, Qi Chen, Yuyin Zhou, Qi Wu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D segmentation 三维分割
medical imaging 医学成像
State Space Models 状态空间模型
Input: High-resolution 3D medical images 高分辨率3D医学图像
Step1: Evaluate Mamba against Transformers 第一阶段:评估Mamba对比Transformers
Step2: Implement multi-scale representation learning 实现多尺度表征学习
Step3: Benchmark against public datasets 在公开数据集上进行基准测试
Output: Comparative analysis of segmentation performance 输出:分割性能比较分析
8.5 [8.5] 2503.19355 ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
[{'name': 'Dohwan Ko, Sihyeon Kim, Yumin Suh, Vijay Kumar B. G, Minseo Yoon, Manmohan Chandraker, Hyunwoo J. Kim'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
spatio-temporal reasoning
Vision-Language Models
autonomous driving
Input: Real-world videos with 3D annotations 实际视频与3D注释
Step1: Dataset construction 数据集构建
Step2: Kinematic instruction tuning 运动指令调优
Step3: Model training and evaluation 模型训练与评估
Output: Enhanced Vision-Language Model 改进的视觉语言模型
8.5 [8.5] 2503.19358 From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
[{'name': 'Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang'}]
Camera Relocalization 相机重定位 v2
camera relocalization
3D reconstruction
Gaussian splatting
Input: Query image 查询图像
Step1: Sparse feature extraction 稀疏特征提取
Step2: Initial pose estimation using sparse matching 根据稀疏匹配初步估计位姿
Step3: Dense feature matching for pose refinement 通过密集特征匹配进行位姿精炼
Output: Accurate camera pose result 精确的相机位姿结果
8.5 [8.5] 2503.19391 TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
[{'name': 'Zhiying Song, Lei Yang, Fuxi Wen, Jun Li'}]
Autonomous Systems and Robotics 自动驾驶与机器人系统 v2
cooperative perception
trajectory alignment
autonomous driving
feature fusion
Input: Multi-frame LiDAR sequences 多帧激光雷达序列
Step1: Learning feature trajectories 学习特征轨迹
Step2: Generating attention points 生成注意力点
Step3: Aligning features against trajectories 将特征与轨迹对齐
Output: Enhanced cooperative perception 改进的协作感知
8.5 [8.5] 2503.19405 Multi-modal 3D Pose and Shape Estimation with Computed Tomography
[{'name': 'Mingxiao Tu, Hoijoon Jung, Alireza Moghadam, Jineel Raythatha, Lachlan Allan, Jeremy Hsu, Andre Kyme, Jinman Kim'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D pose estimation 3D姿态估计
shape estimation 形状估计
computed tomography 计算机断层扫描
multi-modal fusion 多模态融合
Input: Computed tomography (CT) scans and depth maps 计算机断层扫描(CT)和深度图
Step1: Feature extraction 特征提取
Step2: Probabilistic correspondence alignment 概率对应对齐
Step3: Pose and shape estimation 位置和形状估计
Step4: Parameter mixing model 参数混合模型
Output: Accurate 3D human mesh model 准确的三维人类网格模型
8.5 [8.5] 2503.19721 EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction
[{'name': 'Chengjie Ge, Xueyang Fu, Peng He, Kunyu Wang, Chengzhi Cao, Zheng-Jun Zha'}]
Video Generation 视频生成 v2
event-based video reconstruction
spatio-temporal locality
Mamba
computer vision
neural networks
Input: Event data 事件数据
Step1: Implement random window offset strategy 实施随机窗口偏移策略
Step2: Apply Hilbert space filling curve mechanism 应用希尔伯特空间填充曲线机制
Step3: Model evaluation and performance benchmarking 模型评估与性能基准测试
Output: Enhanced reconstructed video frames 改进的视频重建帧
8.5 [8.5] 2503.19755 ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation
[{'name': 'Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, Xiang Bai'}]
Autonomous Driving 自动驾驶 v2
autonomous driving
vision-language models
trajectory prediction
Input: Vision-language instructed action generation 视觉语言指令的动作生成
Step1: Combine QT-Former for temporal context aggregation 结合QT-Former进行时间上下文聚合
Step2: Utilize LLM for driving scenario reasoning 利用大型语言模型进行驾驶场景推理
Step3: Implement a generative planner for trajectory prediction 实施生成式规划器进行轨迹预测
Output: Enhanced closed-loop driving performance 改进的闭环驾驶性能
8.5 [8.5] 2503.19764 OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
[{'name': 'Christina Kassab, Sacha Morin, Martin B\"uchner, Mat\'ias Mattamala, Kumaraditya Gupta, Abhinav Valada, Liam Paull, Maurice Fallon'}]
3D Scene Representation 三维场景表示 v2
3D scene representation
open-vocabulary
benchmark
Input: 3D scene representations 三维场景表示
Step1: Open-set category labeling 开放集类别标注
Step2: Benchmark dataset creation 基准数据集创建
Step3: Evaluation on semantic segmentation 语义分割评估
Step4: Evaluation on object retrieval 对象检索评估
Output: OpenLex3D benchmark dataset 开放Lex3D基准数据集
8.0 [8.0] 2503.19654 RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models
[{'name': 'Mehdi Moshtaghi, Siavash H. Khajavi, Joni Pajarinen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models 视觉语言模型
RGB-Thermal understanding RGB-热成像理解
Input: RGB-Thermal image pairs RGB-热成像对
Step1: Comprehensive evaluation framework 建立全面评估框架
Step2: Annotation of Yes/No questions 对是/否问题的标注
Step3: Performance evaluation on VLMs 对视觉语言模型的性能评估
Output: Benchmark for assessing VLMs 性能评估基准
8.0 [8.0] 2503.19794 PAVE: Patching and Adapting Video Large Language Models
[{'name': 'Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
Video LLMs
3D reasoning
multimodal learning
Input: Pre-trained Video LLMs 和附加信号
Step1: 插入轻量级适配器以适应下游任务
Step2: 融合视频与其他信号
Step3: 评估模型在不同任务上的表现
Output: 改进的模型表现
7.5 [7.5] 2503.19325 Long-Context Autoregressive Video Modeling with Next-Frame Prediction
[{'name': 'Yuchao Gu, Weijia Mao, Mike Zheng Shou'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
autoregressive modeling
temporal context
Input: Video data 视频数据
Step 1: Introduce FAR for video autoregressive modeling 引入FAR用于视频自回归建模
Step 2: Implement FlexRoPE for temporal decay 实现FlexRoPE以进行时间衰减
Step 3: Apply long short-term context modeling 应用长短期上下文建模
Output: State-of-the-art video generation 先进的视频生成
7.5 [7.5] 2503.19462 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset
[{'name': 'Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
diffusion models
synthetic dataset
Input: Pretrained video diffusion model 预训练视频扩散模型
Step 1: Generate synthetic dataset from denoising trajectories 从去噪轨迹生成合成数据集
Step 2: Design trajectory-based few-step guidance 设计基于轨迹的少步指导
Step 3: Implement adversarial training to align output distribution 实施对抗训练以对齐输出分布
Output: Accelerated video generation 加速视频生成
7.5 [7.5] 2503.19839 FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
[{'name': 'Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
image editing
vision language models
fine-grained editing
Input: User editing instructions 用户编辑指令
Step1: Integrate region tokens 集成区域标记
Step2: Use VLM for comprehension 使用视觉语言模型进行理解
Step3: Apply diffusion model for editing 应用扩散模型进行编辑
Output: Edited images 生成的编辑图像
7.5 [7.5] 2503.19910 CoLLM: A Large Language Model for Composed Image Retrieval
[{'name': 'Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Composed Image Retrieval
Vision-Language Models
Large Language Models
Input: Image-caption pairs 图像-字幕对
Step1: Dynamic triplet synthesis 动态三元组合成
Step2: Model training 模型训练
Output: Enhanced composed image retrieval systems 改进的组合图像检索系统

Arxiv 2025-03-25

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.17467 High Efficiency Wiener Filter-based Point Cloud Quality Enhancement for MPEG G-PCC
[{'name': 'Yuxuan Wei, Zehan Wang, Tian Guo, Hao Liu, Liquan Shen, Hui Yuan'}]
3D Reconstruction and Modeling 三维重建 v2
point cloud compression
Wiener filter
3D reconstruction 三维重建
Input: Point clouds 点云
Step1: Introduce basic Wiener filter 基本维纳滤波器引入
Step2: Improve filter with coefficients inheritance and variance-based classification 改善滤波器,引入系数继承和基于方差的分类
Step3: Fast nearest neighbor search using Morton code 快速最近邻搜索,使用Morton编码
Output: Enhanced point cloud quality 改进的点云质量
9.5 [9.5] 2503.17486 ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes
[{'name': 'Zhengqing Gao, Dongting Hu, Jia-Wang Bian, Huan Fu, Yan Li, Tongliang Liu, Mingming Gong, Kun Zhang'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
novel view synthesis
efficient rendering
Input: Gaussian primitives 高斯原语
Step1: Grouping Gaussians into prototypes 组合同类高斯点为原型
Step2: Clustering using K-means 使用K均值聚类
Step3: Joint optimization of anchor points and prototypes 对锚点和原型进行联合优化
Output: Efficient and high-quality rendering 高效且高质量的渲染
9.5 [9.5] 2503.17668 3D Modeling: Camera Movement Estimation and path Correction for SFM Model using the Combination of Modified A-SIFT and Stereo System
[{'name': 'Usha Kumari, Shuvendu Rana'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Structure From Motion
camera movement
Affine SIFT
Input: Multi-view images 多视角图像
Step1: Extract matching points 提取匹配点
Step2: Camera rotation estimation 相机旋转估计
Step3: Translation estimation and correction 平移估计与修正
Output: Accurate 3D model creation 准确的三维模型生成
9.5 [9.5] 2503.17798 GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting
[{'name': 'Zexu Huang, Min Xu, Stuart Perry'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
neural rendering
photo-realistic rendering
Input: 3D Gaussian representations 三维高斯表示
Step1: Patch attention algorithm application 局部关注算法应用
Step2: Gaussian constraints implementation 高斯约束实施
Step3: Subdivision strategy for large scenes 大场景分割策略
Output: Enhanced rendering quality 改进的渲染质量
9.5 [9.5] 2503.17814 LightLoc: Learning Outdoor LiDAR Localization at Light Speed
[{'name': 'Wen Li, Chen Liu, Shangshu Yu, Dunqiang Liu, Yin Zhou, Siqi Shen, Chenglu Wen, Cheng Wang'}]
Autonomous Driving 自动驾驶 v2
LiDAR localization
SLAM
autonomous driving
Input: LiDAR data 和 LiDAR 数据
Step1: Sample classification guidance 样本分类指导
Step2: Redundant sample downsampling 冗余样本下采样
Step3: Integration into SLAM and model evaluation 集成到SLAM并进行模型评估
Output: Fast-trainable localization model 快速可训练的定位模型
9.5 [9.5] 2503.17856 ClaraVid: A Holistic Scene Reconstruction Benchmark From Aerial Perspective With Delentropy-Based Complexity Profiling
[{'name': 'Radu Beche, Sergiu Nedevschi'}]
3D Reconstruction 三维重建 v2
3D reconstruction
aerial imagery
dataset creation
scene complexity
Input: Aerial imagery from UAV 多视角无人机图像
Step1: Dataset creation 数据集创建
Step2: Scene complexity profiling 场景复杂度分析
Step3: Benchmarking reconstruction methods 重建方法基准测试
Output: High-quality synthetic dataset 高质量合成数据集
9.5 [9.5] 2503.17973 PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos
[{'name': 'Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, Yunzhu Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
physics-informed models
robotic motion planning
deformable objects
real-time simulation
Input: Sparse videos of deformable objects 变形物体的稀疏视频
Step1: Develop physics-informed representation 发展物理信息表示
Step2: Integrate inverse modeling framework 整合反向建模框架
Step3: Optimize geometry and physical properties 优化几何和物理属性
Output: Interactive digital twin interactive digital twin
9.5 [9.5] 2503.18007 SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance
[{'name': 'Hongyu Yan, Zijun Li, Kunming Luo, Li Lu, Ping Tan'}]
Point Cloud Processing 点云处理 v2
Point cloud completion
3D reconstruction
symmetry guidance
Input: Partial point clouds
Step1: Local Symmetry Transformation Network (LSTNet) estimates point-wise local symmetry transformations.
Step2: Generate geometry-aligned partial-missing pairs and initial point clouds.
Step3: Symmetry-Guidance Transformer (SGFormer) refines the initial point clouds using geometric features.
Output: High-fidelity and geometry-consistency final point clouds.
9.5 [9.5] 2503.18100 M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
[{'name': 'Xuesong Chen, Shaoshuai Shi, Tao Ma, Jingqiu Zhou, Simon See, Ka Chun Cheung, Hongsheng Li'}]
3D Detection 三维检测 v2
3D detection 三维检测
autonomous driving 自动驾驶
multi-task learning 多任务学习
Input: Multimodal data from sensors and cameras 多模态传感器和相机数据
Step1: Feature extraction from images and LiDAR features 从图像和LiDAR特征提取
Step2: Modality-Adaptive Feature Integration (MAFI) module implementation 实现模态自适应特征集成(MAFI)模块
Step3: Task-specific query initialization for detection and segmentation 目标检测和分割的任务特定查询初始化
Step4: Shared BEV features transformation through multi-layer decoders 共享BEV特征的多层解码器变换
Output: Enhanced detection, segmentation, and occupancy prediction results 改进的检测、分割和占用预测结果
9.5 [9.5] 2503.18135 MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
[{'name': 'Jiaxin Huang, Runnan Chen, Ziwen Li, Zhengqing Gao, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu'}]
3D Reasoning Segmentation 3D推理分割 v2
3D reasoning segmentation
multimodal learning
user intent
Input: Multi-view images and text queries 多视角图像和文本查询
Step1: Generate multi-view pseudo segmentation masks 生成多视角伪分割掩模
Step2: Unproject 2D masks into 3D space 将2D掩模投影到3D空间
Step3: Align masks with text embeddings 将掩模与文本嵌入对齐
Step4: Implement spatial consistency strategy 实施空间一致性策略
Output: Coherent 3D segmentation masks 输出一致的3D分割掩模
9.5 [9.5] 2503.18361 NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
[{'name': 'Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou, Baorui Ma, Kanle Shi, Yu-Shen Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
neural radiance fields
multi-view consistency
surface reconstruction
signed distance function
Input: Multi-view RGB images 多视角RGB图像
Step1: Learn neural radiance fields using volume rendering 学习使用体积渲染的神经辐射场
Step2: Impose multi-view consistency constraint 强加多视角一致性约束
Step3: Infer signed distance fields (SDF) 推断有符号距离场
Step4: Evaluate surface reconstruction against benchmarks 评估表面重建结果
9.5 [9.5] 2503.18363 MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
[{'name': 'Wenyuan Zhang, Yixiao Yang, Han Huang, Liang Han, Kanle Shi, Yu-Shen Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
neural rendering
monocular depth
multi-view
uncertainty
Input: Multi-view images 多视角图像
Step1: Segment multi-view images into consistent instances 将多视角图像分割为一致的实例
Step2: Back-project and align estimated depth values 将估计的深度值反投影并对齐
Step3: Evaluate point density to measure uncertainty 评估点密度以测量不确定性
Output: Uncertainty maps and enhanced geometric priors 不确定性图和增强几何先验
9.5 [9.5] 2503.18368 MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
[{'name': 'Xu Han, Yuan Tang, Jinfeng Xu, Xianzhi Li'}]
3D Representation Learning 3D表示学习 v2
3D representation learning
parameter-efficient fine-tuning
point clouds
Input: 3D point clouds 3D点云
Step1: Parameter-efficient fine-tuning using structured matrices 使用结构化矩阵进行参数高效微调
Step2: Model training and evaluation 模型训练与评估
Output: Enhanced representation for 3D tasks 改进的3D任务表示
9.5 [9.5] 2503.18402 DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
[{'name': 'Youyu Chen, Junjun Jiang, Kui Jiang, Xiao Tang, Zhihao Li, Xianming Liu, Yinyu Nie'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting 3D高斯点云
optimization optimization
rendering rendering
Input: 3D scenes 3D场景
Step1: Optimization complexity analysis 优化复杂度分析
Step2: Scheduling rendering resolution 渲染分辨率调度
Step3: Adaptive primitive growth primitives 自适应原始增长
Output: Accelerated 3D Gaussian Splatting model 加速的3D高斯点云模型
9.5 [9.5] 2503.18438 ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation
[{'name': 'Guosheng Zhao, Xiaofeng Wang, Chaojun Ni, Zheng Zhu, Wenkang Qin, Guan Huang, Xingang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
autonomous driving
Input: Multi-view images 多视角图像
Step1: Domain gap mitigation 域间隙缓解
Step2: Spatial deformation learning 空间变形学习
Step3: 3D Gaussian modeling 三维高斯建模
Output: Improved driving scene representation 改进的驾驶场景表示
9.5 [9.5] 2503.18458 StableGS: A Floater-Free Framework for 3D Gaussian Splatting
[{'name': 'Luchao Wang, Qian Ren, Kaiming He, Hua Wang, Zhi Chen, Yaohua Tang'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
novel view synthesis
floater artifacts
Input: 3D Gaussian Splatting data 3D高斯点云数据
Step1: Analyze gradient vanishing gradient消失分析
Step2: Develop cross-view depth consistency constraints 开发视图间深度一致性约束
Step3: Integrate a dual-opacity model 集成双透明度模型
Output: Enhanced novel view synthesis results 改进的新的视图合成结果
9.5 [9.5] 2503.18461 MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing
[{'name': 'Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu'}]
3D Generation 三维生成 v2
3D PBR Texturing
Multi-Channel Generation
Agentic Post-Processing
Input: Untextured mesh and user inputs 未纹理化网格与用户输入
Step1: Multi-channel multi-view generation 多通道多视角生成
Step2: Agentic post-processing 代理后处理
Output: High-fidelity PBR textures 高保真物理基础渲染纹理
9.5 [9.5] 2503.18476 Global-Local Tree Search for Language Guided 3D Scene Generation
[{'name': 'Wei Deng, Mengshi Qi, Huadong Ma'}]
3D Scene Generation 3D场景生成 v2
3D indoor scene generation
Vision-Language Models (VLMs)
tree search algorithm
Input: User-provided scene descriptions 用户提供的场景描述
Step1: Hierarchical scene representation construction 层次场景表示构建
Step2: Global-local tree search algorithm application 全局-局部树搜索算法应用
Step3: Object placement using VLM object recognition 使用VLM对象识别进行物体放置
Output: Realistic 3D indoor scenes 真实的室内3D场景
9.5 [9.5] 2503.18527 AIM2PC: Aerial Image to 3D Building Point Cloud Reconstruction
[{'name': 'Soulaimene Turki, Daniel Panangian, Houda Chaabouni-Chouayakh, Ksenia Bittner'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
point cloud
building reconstruction
aerial image
Input: Single aerial image 单幅航空图像
Step1: Feature extraction 特征提取
Step2: Concatenate additional conditions 连接附加条件
Step3: Point cloud diffusion modeling 点云扩散建模
Output: Complete 3D building point cloud 生成完整的三维建筑点云
9.5 [9.5] 2503.18557 LeanStereo: A Leaner Backbone based Stereo Network
[{'name': 'Rafia Rahim, Samuel Woerz, Andreas Zell'}]
Stereo Matching 立体匹配 v2
Stereo Matching
Depth Estimation
3D Reconstruction
Input: Rectified stereo images 经过校正的立体图像
Step1: Feature extraction 特征提取
Step2: Cost volume integration 成本体积集成
Step3: Disparity regression 视差回归
Output: Depth map 深度图
9.5 [9.5] 2503.18640 LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment
[{'name': 'Haoran Wang, Jingwei Huang, Lu Yang, Tianchen Deng, Gaojing Zhang, Mingrui Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
multi-view optimization
low-light enhancement
Input: Low-light images 低光照图像
Step1: Gaussian representation decomposition 高斯表示分解
Step2: Image enhancement enhancement 图像增强
Step3: Multi-view consistency optimization 多视角一致性优化
Output: Enhanced 3D models 改进的三维模型
9.5 [9.5] 2503.18671 Structure-Aware Correspondence Learning for Relative Pose Estimation
[{'name': 'Yihan Chen, Wenfei Yang, Huan Ren, Shifeng Zhang, Tianzhu Zhang, Feng Wu'}]
3D Reconstruction and Modeling 三维重建 v2
Relative Pose Estimation 相对姿态估计
3D Correspondences 3D对应
Keypoint Extraction 关键点提取
Structure-Aware 学习结构感知
Input: Query and reference images 图像输入
Step1: Structure-aware keypoint extraction module 结构感知关键点提取模块
Step2: Structure-aware correspondence estimation module 结构感知对应估计模块
Step3: 3D-3D correspondence establishment 3D-3D对应建立
Output: Estimated relative pose 估计的相对姿态
9.5 [9.5] 2503.18682 Hardware-Rasterized Ray-Based Gaussian Splatting
[{'name': 'Samuel Rota Bul\`o, Nemanja Bartolovic, Lorenzo Porzi, Peter Kontschieder'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
ray-based rendering
virtual reality
Input: 3D Gaussian primitives 3D 高斯原语
Step1: Mathematical derivation 数学推导
Step2: Efficient rendering techniques 高效渲染技术
Step3: Performance evaluation 性能评估
Output: High-quality rendering output 高质量渲染输出
9.5 [9.5] 2503.18794 NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
[{'name': 'Yulong Zheng, Zicheng Jiang, Shengfeng He, Yandu Sun, Junyu Dong, Huaidong Zhang, Yong Du'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
sparse view synthesis
Neural Radiance Fields
3D Gaussian Splatting
Input: Sparse-view images 稀疏视图图像
Step 1: Depth computation using optical flow and camera poses 使用光流和相机姿态进行深度计算
Step 2: Point cloud densification 点云密化
Step 3: Model evaluation and comparison 模型评估与比较
Output: Enhanced novel view synthesis 输出:改进的视图合成
9.5 [9.5] 2503.18897 Online 3D Scene Reconstruction Using Neural Object Priors
[{'name': 'Thomas Chabal, Shizhe Chen, Jean Ponce, Cordelia Schmid'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
neural implicit representations
Input: RGB-D video sequence RGB-D视频序列
Step1: Extract object masks and camera poses 提取物体掩模和相机姿态
Step2: Continuous optimization of object representation 对物体表示进行连续优化
Step3: Utilize shape priors from object library 利用物体库中的形状先验
Output: Online reconstructed 3D scene 在线重建的3D场景
9.5 [9.5] 2503.18945 Aether: Geometric-Aware Unified World Modeling
[{'name': 'Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He'}]
3D Reconstruction and Modeling 三维重建与建模 v2
4D reconstruction
autonomous systems
Input: Synthetic 4D video data 合成的4D视频数据
Step1: Data annotation data annotation 数据标注
Step2: Multi-task optimization 多任务优化
Step3: Model training and evaluation 模型训练与评估
Output: Unified world model with geometric reasoning 具有几何推理的统一世界模型
9.2 [9.2] 2503.17712 Multi-modality Anomaly Segmentation on the Road
[{'name': 'Heng Gao, Zhuolin He, Shoumeng Qiu, Xiangyang Xue, Jian Pu'}]
Autonomous Systems and Robotics 自动驾驶 v2
anomaly segmentation
autonomous driving
multi-modal
Input: Road images with anomalies 具有异常的路面图像
Step1: Text-modal extraction using CLIP 通过CLIP提取文本模态
Step2: Anomaly score computation 计算异常得分
Step3: Ensemble boosting of scores 加权平均多个得分
Output: Anomaly segmentation map 异常 сегментация图
9.2 [9.2] 2503.18052 SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
[{'name': 'Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
3D scene understanding
self-supervised learning
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据
Step 1: Dataset creation (SceneSplat-7K) 数据集创建(SceneSplat-7K)
Step 2: Model training with vision-language pretraining 通过视觉语言预训练训练模型
Step 3: Evaluate performance on segmentation benchmarks 在分割基准上评估性能
Output: Enhanced understanding of 3D scenes 改进的3D场景理解
9.2 [9.2] 2503.18107 PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding
[{'name': 'Hongjia Zhai, Hai Li, Zhenzhe Li, Xiaokun Pan, Yijia He, Guofeng Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D panoptic segmentation
3D Gaussian Splatting
scene understanding
language-guided segmentation
Input: Multi-view posed images 多视角图像
Step1: Model continuous parametric feature space 建模连续参数特征空间
Step2: Use 3D feature decoder 采用三维特征解码器
Step3: Perform graph clustering based segmentation 进行基于图聚类的分割
Output: 3D consistent instance segmentation 三维一致实例分割
9.2 [9.2] 2503.18155 Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes
[{'name': 'Kelly O. Marshall, Omid Poursaeed, Sergiu Oprea, Amit Kumar, Anushrut Jignasu, Chinmay Hegde, Yilei Li, Rakesh Ranjan'}]
3D Generation 三维生成 v2
3D scene generation
natural language processing
multimodal learning
Input: User-generated prompts 用户生成的提示
Step 1: Text to dense annotation text 转为密集注释
Step 2: Layout design for objects 设计对象布局
Step 3: Furniture selection from inventory 从库存中选择家具
Output: Structured 3D indoor scenes 输出结构化三维室内场景
9.2 [9.2] 2503.18254 Surface-Aware Distilled 3D Semantic Features
[{'name': 'Lukas Uzolas, Elmar Eisemann, Petr Kellnhofer'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
semantic features
Input: Training with unpaired 3D meshes 使用无配对3D网格进行训练
Step1: Learning a surface-aware embedding space 学习表面感知嵌入空间
Step2: Implementing a contrastive loss to improve feature distinction 实施对比损失以提高特征区分
Output: Robust 3D features for various applications 输出: 适用于多种应用的稳健3D特征
9.0 [9.0] 2503.17574 Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting
[{'name': 'Simona Kocour, Assia Benbihi, Aikaterini Adam, Torsten Sattler'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
semantic residuals
object removal
privacy-preserving mapping
3D Gaussian Splatting
Input: 3D scenes with objects 3D场景与对象
Step1: Evaluation of object removal methods 对对象删除方法进行评估
Step2: Measurement of semantic residuals 语义残余的测量
Step3: Refinement of object removal results 根据空间和语义一致性优化删除结果
Output: Evaluated removal quality and refined scenes 评估删除质量和优化后场景
9.0 [9.0] 2503.18083 Unified Geometry and Color Compression Framework for Point Clouds via Generative Diffusion Priors
[{'name': 'Tianxin Huang, Gim Hee Lee'}]
3D Reconstruction and Modeling 三维重建 v2
point cloud compression
generative diffusion models
3D modeling
autonomous driving
Input: 3D point clouds with color attributes 具有颜色属性的3D点云
Step1: Adaptation of pre-trained generative diffusion model 适应预训练生成扩散模型
Step2: Compression using prompt tuning 使用提示调优进行压缩
Step3: Data encoding into sparse sets 将数据编码为稀疏集合
Step4: Decompression through denoising steps 通过去噪步骤进行解压缩
Output: Compressed and decompressed point clouds 压缩和解压缩的点云
9.0 [9.0] 2503.18944 DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation
[{'name': 'Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe'}]
3D Segmentation 3D分割 v2
3D segmentation 3D分割
2D foundation models 2D基础模型
semantic segmentation 语义分割
Input: 2D foundation model features 2D基础模型特征
Step1: Feature extraction 特征提取
Step2: 2D to 3D projection 2D到3D投影
Step3: Integration into 3D segmentation model 集成到3D分割模型中
Output: Enhanced 3D segmentation performance 改进的3D分割性能
8.5 [8.5] 2503.17406 IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes
[{'name': 'Haochen Zhang, Nader Zantout, Pujith Kachana, Ji Zhang, Wenshan Wang'}]
3D Scene Understanding 3D 场景理解 v2
3D scenes
referential grounding
benchmark dataset
multimodal integration
interactive navigation
Input: 3D scanned rooms 3D 扫描房间
Step1: Dataset curation 数据集策划
Step2: Model evaluation 模型评估
Step3: Baseline development 基线开发
Output: Resource for interactive navigation systems 为交互导航系统提供资源
8.5 [8.5] 2503.17415 Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs)
[{'name': 'Yicheng Duan, Xi Huang, Duo Chen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models (VLMs)
Video Retrieval 视频检索
Contextual Relationships 上下文关系
Input: Video segments 视频段
Step1: Embed video frames using VLM 使用视觉语言模型(VLM)嵌入视频框架
Step2: Combine embeddings with contextual metadata 结合嵌入与上下文元数据
Step3: Implement vector similarity search with graph structures 实现与图结构的向量相似性搜索
Output: Refined video retrieval results 精细化的视频检索结果
8.5 [8.5] 2503.17499 Event-Based Crossing Dataset (EBCD)
[{'name': "Joey Mul\'e, Dhandeep Challagundla, Rachit Saini, Riadul Islam"}]
Event-based Vision 事件视觉 v2
Event-based vision 事件视觉
object detection 目标检测
autonomous systems 自主系统
Input: Event-based images 事件图像
Step1: Data capture using multi-thresholding 多阈值数据捕获
Step2: Object detection using CNNs 使用卷积神经网络进行目标检测
Step3: Performance evaluation against traditional datasets 性能评估与传统数据集对比
Output: Enhanced dataset for event-based detection 改进的事件检测数据集
8.5 [8.5] 2503.17539 Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks
[{'name': 'Bhishma Dedhia, David Bourgin, Krishna Kumar Singh, Yuheng Li, Yan Kang, Zhan Xu, Niraj K. Jha, Yuchen Liu'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
parallel inference
Diffusion Transformers
temporal consistency
Input: Short videos 短视频
Step1: Encoding video chunks 编码视频块
Step2: Parallel inference of video segments 视频段落的并行推理
Step3: Denoising video chunks 去噪视频块
Output: Long photorealistic videos 长的照相真实视频
8.5 [8.5] 2503.17695 MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion
[{'name': 'Yikun Ma, Yiqing Li, Jiawei Wu, Zhi Jin'}]
Multi-view and Stereo Vision 多视角立体视觉 v2
motion editing
multi-view consistency
generative models
optical flow
Input: Static scene and user-selected motion priors 静态场景和用户选择的运动先验
Step1: Multi-view Flow Estimation Stage (MFES) 多视角流估计阶段
Step2: Point Kinematic Model (PKM) to estimate optical flows 使用点运动模型估计光流
Step3: Multi-view Motion Diffusion Stage (MMDS) to generate motion results 多视角运动扩散阶段生成运动结果
Output: Consistent multi-view motion results 一致的多视角运动结果
8.5 [8.5] 2503.17752 HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
[{'name': 'R. D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang'}]
LiDAR Segmentation 激光雷达分割 v2
LiDAR segmentation
autonomous driving
semi-supervised learning
Input: Continuous LiDAR frames 连续的激光雷达帧
Step1: Learn high and low temporal sensitivity representations 学习高低时间敏感性表示
Step2: Enhance representations使用交叉注意力机制增强表示
Step3: Teacher-student framework alignment 在标签和未标签分支上对齐表示
Output: Segmentation results based on LiDAR frames 基于激光雷达帧的分割结果
8.5 [8.5] 2503.17788 Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction
[{'name': 'Gaoge Han, Yongkang Cheng, Zhe Chen, Shaoli Huang, Tongliang Liu'}]
3D Reconstruction and Modeling 三维重建 v2
3D hand reconstruction
occlusion handling
multimodal prior integration
diffusion models
fusion alignment
Input: Monocular images 单目图像
Step1: Learn to align fused multimodal priors (keypoints, segmentation maps, depth cues) from foundation models during training 训练期间学习对齐融合的多模态先验(关键点、分割图、深度线索)
Step2: Employ a two-hand diffusion model to correct interpenetration artifacts 应用双手扩散模型以修正穿透伪影
Output: Occlusion-resistant two-hand reconstruction 具抗遮挡能力的双手重建
8.5 [8.5] 2503.17938 Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning
[{'name': 'Xiang Fang, Shihua Zhang, Hao Zhang, Tao Lu, Huabing Zhou, Jiayi Ma'}]
Multi-view and Stereo Vision 多视角和立体视觉 v2
correspondence learning
two-view matching
SFM
pose estimation
Input: Two-view image pairs 两幅图像对
Step1: Develop correspondence filter 研发对应过滤器
Step2: Implement causal sequence learning 实现因果序列学习
Step3: Integrate local-context enhancement module 集成局部上下文增强模块
Step4: Evaluate performance on relative pose estimation 评估相对姿态估计的性能
Output: Enhanced matching accuracy 提升匹配精度
8.5 [8.5] 2503.17982 Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images
[{'name': 'Yara AlaaEldin, Francesca Odone'}]
Depth Estimation 深度估计 v2
depth estimation
semantic segmentation
aerial images
autonomous navigation
Input: Aerial images from monocular cameras 单目相机的航拍图像
Step1: Joint architecture design 结构设计
Step2: Depth estimation map prediction 深度估计图的预测
Step3: Semantic segmentation map prediction 语义分割图的预测
Output: Depth and semantic segmentation maps 深度和语义分割图
8.5 [8.5] 2503.17992 Geometric Constrained Non-Line-of-Sight Imaging
[{'name': 'Xueying Liu, Lianfang Wang, Jun Liu, Yong Wang, Yuping Duan'}]
3D Reconstruction and Modeling 三维重建 v2
Non-line-of-sight imaging
3D reconstruction
surface normal
geometric constraint
Input: Non-line-of-sight (NLOS) data 近眼不可见数据
Step1: Joint estimation of normals and albedo 法线与反照率的联合估计
Step2: Apply Frobenius norm regularization 应用弗罗贝纽斯范数正则化
Step3: High-precision surface reconstruction 提高准确性的表面重建
Output: Accurate geometry of hidden objects 隐藏物体的准确几何形状
8.5 [8.5] 2503.18016 Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook
[{'name': 'Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, Xuming Hu'}]
Image and Video Generation 图像生成与视频生成 v2
Retrieval-Augmented Generation
computer vision
3D generation
Input: A comprehensive overview of retrieval-augmented generation techniques in computer vision 计算机视觉中的检索增强生成技术概述
Step1: Review of visual understanding tasks 视觉理解任务评估
Step2: Examination of visual generation applications 视觉生成应用调查
Step3: Proposal of future research directions 提出未来研究方向
Output: Insights into RAG applications in 3D generation and embodied AI 3D生成和实体AI中的RAG应用见解
8.5 [8.5] 2503.18073 PanopticSplatting: End-to-End Panoptic Gaussian Splatting
[{'name': 'Yuxuan Xie, Xuan Yu, Changjian Jiang, Sitong Mao, Shunbo Zhou, Rui Fan, Rong Xiong, Yue Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian splatting
Input: Multi-view images 多视角图像
Step1: Gaussian segmentation 高斯分割
Step2: Label blending 标签混合
Step3: Cross attention mechanism 交叉注意机制
Output: Consistent 3D panoptic segments 一致的三维全景分段
8.5 [8.5] 2503.18177 Training A Neural Network For Partially Occluded Road Sign Identification In The Context Of Autonomous Vehicles
[{'name': 'Gulnaz Gimaletdinova, Dim Shaiakhmetov, Madina Akpaeva, Mukhammadmuso Abduzhabbarov, Kadyrmamat Momunov'}]
Robotic Perception 机器人感知 v2
traffic sign recognition
partially occlusion
autonomous vehicles
CNN
Input: Dataset of road sign images with occlusions 道路标志图像数据集(包含遮挡)
Step1: Data collection 数据收集
Step2: Model training using CNN models 模型训练使用卷积神经网络
Step3: Comparison of models against transfer learning models 模型与迁移学习模型比较
Output: Performance metrics of models 模型性能指标
8.5 [8.5] 2503.18283 Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context
[{'name': 'Bojun Liu, Yangzhi Ma, Ao Luo, Li Li, Dong Liu'}]
Point Cloud Processing 点云处理 v2
Point cloud compression 点云压缩
Sparse convolution 稀疏卷积
Input: Point cloud data 点云数据
Step1: Context model development 上下文模型开发
Step2: Geometry residual coding development 几何残差编码开发
Step3: Performance evaluation 性能评估
Output: Compressed point cloud representation 压缩点云表示
8.5 [8.5] 2503.18328 TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering
[{'name': 'Chun Gu, Xiaofei Wei, Li Zhang, Xiatian Zhu'}]
Inverse Rendering 逆向渲染 v2
inverse rendering
importance sampling
3D reconstruction
multi-view images
Input: Multi-view images 多视角图像
Step1: Importance sampling 重要性采样
Step2: Sampler learning sampler 学习采样器
Step3: Scene representation scene representation 现场表示
Output: Enhanced rendering outputs 改进的渲染输出
8.5 [8.5] 2503.18341 PS-EIP: Robust Photometric Stereo Based on Event Interval Profile
[{'name': 'Kazuma Kitazawa, Takahito Aoto, Satoshi Ikehata, Tsuyoshi Takatani'}]
3D Reconstruction and Modeling 三维重建 v2
Photometric Stereo 光度立体
Event Camera 事件摄像机
3D Reconstruction 三维重建
Input: Event data from an event camera 事件摄像头的数据
Step1: Formulate event interval profiles 形成事件间隔剖面
Step2: Introduce outlier detection based on profile shape 引入基于剖面形状的异常值检测
Step3: Estimate surface normals using the derived profiles 使用推导的剖面估计表面法线
Output: Robustly estimated surface normals 可靠的表面法线估计
8.5 [8.5] 2503.18384 LiDAR Remote Sensing Meets Weak Supervision: Concepts, Methods, and Perspectives
[{'name': 'Yuan Gao, Shaobo Xia, Pu Wang, Xiaohuan Xi, Sheng Nie, Cheng Wang'}]
3D Reconstruction and Modeling 三维重建与建模 v2
LiDAR remote sensing
weakly supervised learning
3D reconstruction
point clouds
Input: LiDAR data and annotations LiDAR数据和注释
Step1: Review of LiDAR interpretation和反演的研究现状
Step2: Summary of weakly supervised techniques 柔性监督技术的总结
Step3: Discussion of future research directions 未来研究方向的讨论
Output: Comprehensive review of LiDAR remote sensing 综述LiDAR遥感
8.5 [8.5] 2503.18393 PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes
[{'name': 'Xinhua Xu, Hong Liu, Jianbing Wu, Jinfu Liu'}]
Image and Video Generation 图像生成 v2
RGB segmentation
pseudo depth
semantic segmentation
Input: RGB images and pseudo depth maps RGB图像和伪深度图
Step1: Generate pseudo depth maps 生成伪深度图
Step2: Integrate RGB and pseudo depth 结合RGB和伪深度
Step3: Apply Pseudo Depth Aggregation Module (PDAM) 应用伪深度聚合模块 (PDAM)
Step4: Utilize diffusion model for feature extraction 利用扩散模型进行特征提取
Output: Segmentation results segmentation结果
8.5 [8.5] 2503.18408 Fast and Physically-based Neural Explicit Surface for Relightable Human Avatars
[{'name': 'Jiacheng Wu, Ruiqi Zhang, Jie Chen, Hui Zhang'}]
Neural Rendering 神经渲染 v2
3D reconstruction 三维重建
neural rendering 神经渲染
autonomous systems 自动驾驶
Input: Sparse-view videos 稀疏视图视频
Step1: Learning pose-dependent geometry and texture 学习与姿态相关的几何和纹理
Step2: Physically-based rendering and relighting 物理基础渲染与重光照
Output: Relightable human avatars 可重光照的人类化身
8.5 [8.5] 2503.18421 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video
[{'name': 'Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang'}]
3D Reconstruction and Modeling 三维重建 v2
4D Gaussian compression
Free-Viewpoint Video
motion-aware representation
Input: Free-Viewpoint Video (FVV) sequences 自由视角视频序列
Step1: Establish dynamic Gaussian representation 建立动态高斯表示
Step2: Integrate motion-aware encoding 结合运动感知编码
Step3: Optimize rate-distortion trade-off 优化速率-失真权衡
Output: Compressed and rendered FVV 经过压缩和渲染的FVV
8.5 [8.5] 2503.18470 MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
[{'name': 'Zhenyu Pan, Han Liu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D spatial reasoning
reinforcement learning
vision-language models
scene generation
Input: Room image, user preferences, and object status 房间图像、用户偏好和物体状态
Step1: Generate a reasoning trace alongside a JSON-formatted layout 生成推理跟踪和JSON格式布局
Step2: Evaluate layout using reward signals 通过奖励信号评估布局
Step3: Optimize spatial structures through reinforcement learning 通过强化学习优化空间结构
Output: Enhanced 3D scene generation 改进的三维场景生成
8.5 [8.5] 2503.18513 LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
[{'name': 'Xiaoyu Zhang, Weihong Pan, Chong Bao, Xiyu Zhang, Xiaojun Xiang, Hanqing Jiang, Hujun Bao'}]
3D Rendering 三维渲染 v2
Neural Radiance Fields
3D rendering
view synthesis
frequency analysis
autonomous systems
Input: Scenes with varying frequency details 场景含有变化的频率细节
Step1: 3D frequency quantification 进行3D频率量化
Step2: Frequency-aware rendering 实现频率感知渲染
Step3: Model evaluation and comparison with baselines 评估模型并与基准进行比较
Output: High-fidelity view synthesis outputs 高保真视图合成输出
8.5 [8.5] 2503.18540 HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications
[{'name': 'Guneet Mutreja, Philipp Schuegraf, Ksenia Bittner'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
digital surface models
building analysis
remote sensing
Input: High-resolution RGB and DSM data 高分辨率RGB和DSM数据
Step1: Data curation and pairing 数据策划和配对
Step2: Dual-encoder architecture development 双编码器架构开发
Step3: Joint representation learning 联合表示学习
Step4: Comprehensive evaluation on downstream tasks 全面评估下游任务
Output: Improved performance on building-level analysis 改进的建筑水平分析性能
8.5 [8.5] 2503.18541 UniPCGC: Towards Practical Point Cloud Geometry Compression via an Efficient Unified Approach
[{'name': 'Kangli Wang, Wei Gao'}]
Point Cloud Processing 点云处理 v2
point cloud compression
3D reconstruction
efficiency
variable rate
Input: Point cloud data 点云数据
Step1: Implement Uneven 8-Stage Lossless Coder (UELC) 在无损模式下实施不均匀8阶段无损编码器 (UELC)
Step2: Apply Variable Rate and Complexity Module (VRCM) 在有损模式下应用变量速率和复杂性模块 (VRCM)
Step3: Combine UELC and VRCM 动态组合UELC和VRCM
Output: Compressed point cloud representations 压缩点云表示
8.5 [8.5] 2503.18544 Distilling Stereo Networks for Performant and Efficient Leaner Networks
[{'name': 'Rafia Rahim, Samuel Woerz, Andreas Zell'}]
Multi-view Stereo 多视角立体 v2
Knowledge Distillation
Stereo Matching
Depth Estimation
Input: Stereo image pairs 立体图像对
Step1: Design of student network 学生网络设计
Step2: Knowledge distillation pipeline knowledge knowledge distillation流水线
Step3: Evaluation of network performance 网络性能评估
Output: Leaner and faster student networks 精简且快速的学生网络
8.5 [8.5] 2503.18631 Robust Lane Detection with Wavelet-Enhanced Context Modeling and Adaptive Sampling
[{'name': 'Kunyang Li, Ming Hou'}]
Autonomous Driving 自动驾驶 v2
lane detection
autonomous driving
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Model evaluation 模型评估
Output: Enhanced 3D models 改进的三维模型
8.5 [8.5] 2503.18673 Any6D: Model-free 6D Pose Estimation of Novel Objects
[{'name': 'Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon'}]
3D Reconstruction and Modeling 三维重建 v2
6D pose estimation
object detection
computer vision
Input: Single RGB-D anchor image 单个RGB-D锚图像
Step1: Joint object alignment process 物体对齐处理
Step2: Render-and-compare strategy 渲染与比较策略
Step3: Pose hypothesis generation 生成目标假设
Output: Accurate 6D pose and size estimation 准确的6D姿势和尺寸估计
8.5 [8.5] 2503.18711 Accenture-NVS1: A Novel View Synthesis Dataset
[{'name': "Thomas Sugg, Kyle O'Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani"}]
Novel View Synthesis 新颖视图合成 v2
novel view synthesis
3D reconstruction
multi-view scenes
Input: Multi-view images 多视角图像
Step1: Data collection 数据收集
Step2: Calibration and geolocation 校准与地理定位
Step3: Dataset integration 数据集整合
Output: ACC-NVS1 dataset ACC-NVS1 数据集
8.5 [8.5] 2503.18718 GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting
[{'name': 'Lijiang Li, Jinglu Wang, Xiang Ming, Yan Lu'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
watermarking
3D models
Input: 3D Gaussian models 3D高斯模型
Step1: Message embedding 消息嵌入
Step2: Distortion enhancement 扭曲增强
Step3: Watermark extraction 水印提取
Output: Robust watermarked models 可靠水印模型
8.5 [8.5] 2503.18725 FG$^2$: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
[{'name': 'Zimin Xia, Alexandre Alahi'}]
3D Localization and Mapping 3D定位与地图构建 v2
3D localization
fine-grained feature matching
autonomous navigation
Input: Ground-level image and aerial image 地面图像与航空图像
Step1: Map ground image features to 3D point cloud 将地面图像特征映射到3D点云
Step2: Select features along height dimension along 选择高度维度的特征
Step3: Compute point correspondences using Procrustes alignment 使用Procrustes对齐计算点对应关系
Output: Estimated 3 Degrees of Freedom pose of the ground image 估计地面图像的3个自由度姿态
8.5 [8.5] 2503.18767 Good Keypoints for the Two-View Geometry Estimation Problem
[{'name': 'Konstantin Pakulev, Alexander Vakhitov, Gonzalo Ferrer'}]
Visual Odometry 视觉里程计 v2
keypoint detection 关键点检测
homography estimation 单应性估计
structure from motion 运动结构估计
visual SLAM 视觉SLAM
Input: Image pairs for keypoint detection 图像对用于关键点检测
Step1: Develop a theoretical model for keypoint scoring 建立关键点评分的理论模型
Step2: Identify properties of good keypoints 确定良好关键点的特性
Step3: Design and implement the BoNeSS-ST keypoint detector 设计并实现BoNeSS-ST关键点检测器
Output: Enhanced keypoint performance 改进的关键点表现
8.5 [8.5] 2503.18853 3DSwapping: Texture Swapping For 3D Object From Single Reference Image
[{'name': 'Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan'}]
3D Reconstruction and Modeling 三维重建 v2
3D texture swapping
view consistency
gradient guidance
Input: Single reference image 单个参考图像
Step1: Progressive generation 逐步生成
Step2: View-consistency gradient guidance 视图一致性梯度引导
Step3: Prompt-tuning based guidance 提示调优引导
Output: High-fidelity texture swaps 高保真纹理交换
8.5 [8.5] 2503.18903 Building Blocks for Robust and Effective Semi-Supervised Real-World Object Detection
[{'name': 'Moussa Kassem Sbeyti, Nadja Klein, Azarm Nowzad, Fikret Sivrikaya, Sahin Albayrak'}]
Object Detection 目标检测 v2
semi-supervised object detection
autonomous driving
pseudo-labeling
Input: Real-world datasets with labeled and unlabeled data 真实世界数据集,含标记和未标记数据
Step1: Identify challenges in SSOD under real conditions 确定真实条件下的半监督目标检测中的挑战
Step2: Propose building blocks for performance improvement 提出性能改进的构建模块
Step3: Validate the methods through experiments on autonomous driving datasets 通过在自动驾驶数据集上的实验验证方法
Output: Enhanced semi-supervised object detection performance 改进的半监督目标检测性能
8.5 [8.5] 2503.18933 SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
[{'name': 'Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li, Olga Zatsarynna, Juergen Gall'}]
Image and Video Generation 图像生成与视频生成 v2
video prediction
multi-modal
depth
RGB
Input: Past video frames 过去的视频帧
Step1: Modality integration 模态集成
Step2: Multi-modal video prediction 多模态视频预测
Step3: Performance evaluation 性能评估
Output: Future video frames 未来的视频帧
8.5 [8.5] 2503.18950 Target-Aware Video Diffusion Models
[{'name': 'Taeksoo Kim, Hanbyul Joo'}]
Image and Video Generation 图像生成与视频生成 v2
video generation
human-object interaction
robotics
3D motion synthesis
Input: An input image and segmentation mask to indicate the target
Step1: Extend a baseline image-to-video diffusion model to incorporate the target mask
Step2: Introduce a special token to describe the target in the text prompt
Step3: Fine-tune the model using a novel cross-attention loss
Output: Generated video of actor interacting with the specified target
8.0 [8.0] 2503.18556 Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
[{'name': 'Bin Li, Dehong Gao, Yeyuan Wang, Linbo Jin, Shanqing Yu, Xiaoyan Cai, Libin Yang'}]
VLM & VLA 视觉语言模型与对齐 v2
Large Vision-Language Models
hallucinations
contrastive decoding
Input: Image tokens 图像标记
Step1: Attention calculation 注意力计算
Step2: Instruction-based adjustment 基于指令的调整
Step3: Contrastive decoding 对比解码
Output: Adjusted logits 调整后的逻辑值
7.5 [7.5] 2503.17700 MAMAT: 3D Mamba-Based Atmospheric Turbulence Removal and its Object Detection Capability
[{'name': 'Paul Hill, Zhiming Liu, Nantheera Anantrasirichai'}]
3D Reconstruction and Modeling 三维重建 v2
3D convolutions
atmospheric turbulence
object detection
Input: Video sequences affected by atmospheric turbulence 受大气湍流影响的视频序列
Step1: Non-rigid registration using deformable 3D convolutions 采用可变形3D卷积进行非刚性配准
Step2: Contrast and detail enhancement using 3D Mamba architecture 采用3D Mamba架构进行对比度和细节增强
Output: Enhanced video with improved object detection capabilities 提升视频质量并改善物体检测能力
7.5 [7.5] 2503.18278 TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
[{'name': 'Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Chendi Li, Jinghua Yan, Yu Bai, Ponnuswamy Sadayappan, Xia Hu, Bo Yuan'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
token pruning
Vision-Language Models
optimization
Input: Visual tokens 视觉标记
Step1: Token selection 选择标记
Step2: Optimization formulation 优化公式化
Step3: Pruning execution 修剪执行
Output: Reduced token set 减少的标记集
7.5 [7.5] 2503.18623 Training-Free Personalization via Retrieval and Reasoning on Fingerprints
[{'name': 'Deepayan Das, Davide Talon, Yiming Wang, Massimiliano Mancini, Elisa Ricci'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
personalization
multimodal reasoning
Input: Pre-trained Vision-Language Models (VLMs) 预训练视觉语言模型
Step 1: Extract concept fingerprints 提取概念指纹
Step 2: Retrieve similar fingerprints from the database 检索相似的指纹
Step 3: Validate scores through cross-modal verification 验证得分通过跨模态验证
Output: Personal concept identification 输出: 个人概念识别

Arxiv 2025-03-24

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.16591 UniK3D: Universal Camera Monocular 3D Estimation
[{'name': 'Luigi Piccinelli, Christos Sakaridis, Mattia Segu, Yung-Hsu Yang, Siyuan Li, Wim Abbeloos, Luc Van Gool'}]
3D Reconstruction and Modeling 三维重建 v2
monocular 3D estimation
3D reconstruction
Input: Single image from any camera type 任何类型的单幅图像
Step1: Spherical representation modeling 球面表示建模
Step2: Camera ray decomposition 相机光线分解
Step3: Metric 3D reconstruction metrics 计量3D重建评估
Output: Coherent 3D point cloud coherent 3D点云
9.5 [9.5] 2503.16653 iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation
[{'name': 'Hanxiao Wang, Biao Zhang, Weize Quan, Dong-Ming Yan, Peter Wonka'}]
Mesh Reconstruction 网格重建 v2
mesh generation
3D modeling
transformer architecture
attention mechanisms
Input: Mesh representations 网格表示
Step1: Interleaving full and linear attention mechanisms 全部与线性注意机制交错
Step2: Hourglass architecture integration 入集成沙漏架构
Step3: Efficiency enhancements 效率增强
Output: High-quality 3D meshes 高质量三维网格
9.5 [9.5] 2503.16707 Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
[{'name': 'Jinlong Li, Cristiano Saltori, Fabio Poiesi, Nicu Sebe'}]
3D Scene Understanding 3D场景理解 v2
3D scene understanding 3D场景理解
vision-language models 视觉语言模型
uncertainty estimation 不确定性评估
Input: Multiple foundation models 多个基础模型
Step1: Feature embedding extraction 特征嵌入提取
Step2: Cross-modal knowledge distillation 跨模态知识蒸馏
Step3: Uncertainty estimation and harmonization 不确定性评估与协调
Output: Enhanced 3D scene understanding 强化的3D场景理解
9.5 [9.5] 2503.16710 4D Gaussian Splatting SLAM
[{'name': 'Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari'}]
3D Reconstruction and Modeling 三维重建 v2
4D Gaussian Splatting
SLAM
camera localization
dynamic scenes
3D reconstruction
Input: Sequence of RGB-D images RGB-D图像序列
Step1: Generate motion masks 生成运动掩码
Step2: Classify Gaussian primitives into static and dynamic 静态和动态高斯原语分类
Step3: Model transformation fields with sparse control points and MLP 使用稀疏控制点和MLP建模变换场
Output: 4D Gaussian radiance fields 4D高斯辐射场
9.5 [9.5] 2503.16776 OpenCity3D: What do Vision-Language Models know about Urban Environments?
[{'name': 'Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann'}]
3D Scene Understanding 三维场景理解 v2
Vision-Language Models
3D Reconstruction
Urban Analytics
Input: Aerial multi-view images from urban environments 城市环境的多视角航拍图像
Step1: Generate enriched point cloud from 3D reconstructions 从三维重建生成丰富的点云
Step2: Integrate vision-language models to query urban features 集成视觉语言模型以查询城市特征
Step3: Analyze socio-economic properties using language input 使用语言输入分析社会经济属性
Output: Insights into urban characteristics and analytics 对城市特征和分析的洞察
9.5 [9.5] 2503.16811 Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision
[{'name': 'Maoji Zheng, Ziyu Xu, Qiming Xia, Hai Wu, Chenglu Wen, Cheng Wang'}]
3D Object Detection 3D物体检测 v2
3D object detection 3D物体检测
semantic segmentation 语义分割
LiDAR
autonomous driving 自动驾驶
Input: Point cloud data 点云数据
Step1: Multi-Frame Multi-Scale Clustering (MFMS-C) for pseudo-label generation 多帧多尺度聚类生成伪标签
Step2: Semantic-Guiding Iterative-Mining Self-Training (SGIM-ST) for refining labels 语义引导的迭代挖掘自我训练
Output: Enhanced 3D object detection results 改进的三维物体检测结果
9.5 [9.5] 2503.16822 RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
[{'name': 'Yuxin Yao, Zhi Deng, Junhui Hou'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian representation
dynamic modeling
novel view synthesis
Input: Monocular videos of articulated objects 单目视频
Step1: Extract skeleton-aware nodes from 3D Gaussians 从三维高斯中提取关节感知节点
Step2: Simplify skeleton using geometric and semantic information 使用几何和语义信息简化骨骼
Step3: Bind skeleton to 3D Gaussian representation 绑定骨骼到三维高斯表示
Output: Skeleton-driven dynamic model 支驱动态模型
9.5 [9.5] 2503.16924 Optimized Minimal 3D Gaussian Splatting
[{'name': 'Joo Chan Lee, Jong Hwan Ko, Eunbyung Park'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
3D rendering
storage optimization
Input: 3D scenes 3D场景
Step1: Minimize redundancy in Gaussians 最小化高斯冗余
Step2: Create compact attribute representation 创建紧凑属性表示
Step3: Implement sub-vector quantization 实施子向量量化
Output: Reduced storage with minimal Gaussians 减少存储需求的最小高斯
9.5 [9.5] 2503.16964 DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
[{'name': 'Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, Yi Yang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
drone imagery
Gaussian Splatting
dynamic scenes
Input: Drone imagery 无人机图像
Step1: Data integration 数据集成
Step2: Masking and segmentation 伪影与分割
Step3: Gaussian splatting algorithm implementation 高斯点云算法实现
Step4: 3D model reconstruction 三维模型重建
Output: Robust 3D reconstruction of scenes 稳健的场景三维重建
9.5 [9.5] 2503.16970 Distilling Monocular Foundation Model for Fine-grained Depth Completion
[{'name': 'Yingping Liang, Yutao Hu, Wenqi Shao, Ying Fu'}]
Depth Estimation 深度估计 v2
Depth Completion 深度补全
Monocular Models 单目模型
Knowledge Distillation 知识蒸馏
Input: Sparse LiDAR inputs 稀疏LiDAR输入
Step1: Generate diverse training data 生成多样化训练数据
Step2: Distill geometric knowledge 提取几何知识
Step3: Fine-tune with SSI Loss fine-tune 采用SSI Loss进行微调
Output: Enhanced depth completion models 改进的深度补全模型
9.5 [9.5] 2503.17032 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
[{'name': 'Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, Chengfei Lv'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
augmented reality
3D Gaussian Splatting
Input: Multi-view sequences 多视角序列
Step1: Creation of parametric template 创建参数模板
Step2: Pre-training of StyleUnet-based network 预训练StyleUnet网络
Step3: Baking deformations into MLP network 将变形转化为MLP网络
Output: Real-time rendering of avatars 实时渲染头像
9.5 [9.5] 2503.17093 ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration
[{'name': "Johan Edstedt, Andr\'e Mateus, Alberto Jaenal"}]
3D Reconstruction 三维重建 v2
3D Reconstruction
SfM
Point Cloud Registration
Input: SfM point clouds (3D Maps) 输入: SfM点云 (三维地图)
Step1: Estimation of joint reference frame 第一步: 估计联合参考框架
Step2: Point cloud registration point clouds for SfM registration 第二步: SfM 注册的点云注册
Step3: Neural refiner application 第三步: 应用神经修整器
Output: Merged and registered 3D maps 输出: 合并和注册的三维地图
9.5 [9.5] 2503.17097 R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model
[{'name': 'Boyuan Zheng, Shouyi Lu, Renbo Huang, Minqing Huang, Fan Lu, Wei Tian, Guirong Zhuo, Lu Xiong'}]
3D Reconstruction and Modeling 三维重建与建模 v2
4D radar
point clouds
super-resolution
LiDAR
autonomous driving
Input: Paired raw radar and LiDAR point clouds 原始雷达和激光雷达点云对
Step1: Represent point clouds using voxel features 使用体素特征表示点云
Step2: Apply Latent Voxel Diffusion Model (LVDM) 应用潜在体素扩散模型
Step3: Utilize Latent Point Cloud Reconstruction (LPCR) to rebuild point clouds 使用潜在点云重建模块重建点云
Output: Dense LiDAR-like point clouds 输出:密集的激光雷达样点云
9.5 [9.5] 2503.17106 GAA-TSO: Geometry-Aware Assisted Depth Completion for Transparent and Specular Objects
[{'name': 'Yizhe Liu, Tong Jia, Da Cai, Hao Wang, Dongyue Chen'}]
Depth Estimation 深度估计 v2
depth completion
3D structural features
Input: RGB-D input including depth and RGB images (输入: 包含深度和RGB图像的RGB-D输入)
Step1: Extract 2D features from RGB-D data (步骤1: 从RGB-D数据中提取2D特征)
Step2: Back-project depth to a point cloud for 3D feature extraction (步骤2: 将深度反投影到点云以提取3D特征)
Step3: Use gated cross-modal fusion modules for integrating 2D and 3D features (步骤3: 使用门控跨模态融合模块整合2D和3D特征)
Output: Enhanced depth estimation for transparent and specular objects (输出: 针对透明和高光物体的增强深度估计)
9.5 [9.5] 2503.17153 Enhancing Steering Estimation with Semantic-Aware GNNs
[{'name': 'Fouad Makiyeh, Huy-Dung Nguyen, Patrick Chareyre, Ramin Hasani, Marc Blanchon, Daniela Rus'}]
3D Reconstruction and Modeling 三维重建 v2
steering estimation
3D spatial information
autonomous driving
Graph Neural Networks
point clouds
Input: Monocular images and LiDAR-based point clouds
Step1: Estimate depth and semantic maps from 2D images using a unified model
Step2: Generate pseudo-3D point clouds from estimated depth
Step3: Integrate 3D point clouds with Graph Neural Network (GNN) and Recurrent Neural Network (RNN) for steering estimation
Output: Enhanced steering predictions using spatial information
9.5 [9.5] 2503.17168 Hi-ALPS -- An Experimental Robustness Quantification of Six LiDAR-based Object Detection Systems for Autonomous Driving
[{'name': 'Alexandra Arzberger, Ramin Tavakoli Kolagari'}]
3D Object Detection 3D目标检测 v2
LiDAR
object detection
autonomous driving
robustness
Input: LiDAR point cloud data LiDAR点云数据
Step1: Implement Hi-ALPS framework 实现Hi-ALPS框架
Step2: Evaluate robustness of object detection systems 评估目标检测系统的鲁棒性
Step3: Analyze perturbation effects on OD systems 分析对OD系统的扰动影响
Output: Robustness metrics for 3D object detection systems 3D目标检测系统的鲁棒性指标
9.5 [9.5] 2503.17182 Radar-Guided Polynomial Fitting for Metric Depth Estimation
[{'name': 'Patrick Rim, Hyoungseob Park, Vadim Ezhov, Jeffrey Moon, Alex Wong'}]
Depth Estimation 深度估计 v2
3D reconstruction
depth estimation
autonomous driving
radar
polynomial fitting
Input: Radar data and monocular depth predictions 雷达数据与单目深度预测
Step1: Polynomial fitting of depth predictions 深度预测的多项式拟合
Step2: Adaptive adjustment of depth non-uniformly 适应性地对深度进行非均匀调整
Step3: Model training with monotonicity regularization 使用单调性正则化进行模型训练
Output: Metric depth maps and error metrics 精确度量的深度图和误差指标
9.5 [9.5] 2503.17316 Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
[{'name': 'Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
depth completion
multi-view stereo
Input: Multi-view images, camera intrinsics, and depth inputs (RGB, intrinsics, poses)
Step 1: Data integration by incorporating known camera and scene priors
Step 2: Lightweight transformer-based model that allows for modality selection during training
Step 3: Output pointmaps for relative pose estimation and high-resolution reconstruction
9.2 [9.2] 2503.16611 A Recipe for Generating 3D Worlds From a Single Image
[{'name': 'Katja Schwarz, Denys Rozumnyi, Samuel Rota Bul\`o, Lorenzo Porzi, Peter Kontschieder'}]
3D Reconstruction 三维重建 v2
3D reconstruction
depth estimation
image generation
Input: Single image (2D panorama) 单幅图像(2D全景)
Step1: Generate coherent panoramas using a diffusion model 生成连贯的全景图像,使用扩散模型
Step2: Lift panorama into 3D with a metric depth estimator 利用测量深度估计将全景提升到3D
Step3: Inpaint unobserved regions using point clouds 使用点云对未观察区域进行修复
Output: Immersive 3D world 逼真的3D世界
9.2 [9.2] 2503.17175 Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection
[{'name': 'Duanrui Yu, Jing You, Xin Pei, Anqi Qu, Dingyu Wang, Shaocheng Jia'}]
3D Object Detection 3D目标检测 v2
3D object detection 3D目标检测
collaborative perception 协作感知
semantic detection boxes 语义检测框
Input: Multi-agent sparse features 多智能体稀疏特征
Step1: Feature extraction 特征提取
Step2: Temporal fusion 时序融合
Step3: Sparse decoding 稀疏解码
Output: 3D object detection boxes 3D目标检测框
9.0 [9.0] 2503.16825 SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
[{'name': 'Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang'}]
3D Semantic Scene Completion 3D语义场景补全 v2
3D semantic scene completion
satellite-ground fusion
autonomous driving
Input: Satellite and ground images 卫星和地面图像
Step1: Parallel encoding of satellite and ground views 卫星和地面视图的并行编码
Step2: Feature alignment and correction 特征对齐与修正
Step3: Adaptive fusion of multi-view features 多视角特征的自适应融合
Output: Completed 3D semantic scene 完成的3D语义场景
9.0 [9.0] 2503.16979 Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
[{'name': 'Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
dynamic scene reconstruction
Gaussian splatting
Input: Multi-view images 多视角图像
Step1: Generalized Anchor-driven Gaussian Motion Network 引入通用锚点驱动高斯运动网络
Step2: Key-frame-guided Streaming Strategy 关键帧引导流媒体策略
Step3: Real-time evaluation 实时评估
Output: Streamed dynamic scene reconstruction 流媒体动态场景重建
8.5 [8.5] 2503.16535 Vision-Language Embodiment for Monocular Depth Estimation
[{'name': 'Jinchang Zhang, Guoyu Lu'}]
Depth Estimation 深度估计 v2
depth estimation
monocular
robotic perception
Input: RGB images and camera intrinsic properties RGB图像和相机内在特性
Step1: Calculate embodied scene depth 计算具体现场深度
Step2: Integrate depth with image features 深度与图像特征集成
Step3: Use language priors for scene understanding 利用语言先验进行场景理解
Output: Enhanced depth estimations 改进的深度估计
8.5 [8.5] 2503.16538 Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking
[{'name': 'Bastian P\"atzold, Jan Nogga, Sven Behnke'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Instance Segmentation
Open-Vocabulary Detection
Robotics
Input: Structured descriptions from vision-language models (VLMs) 视觉语言模型生成的结构化描述
Step1: Identify visible object instances 识别可见物体实例
Step2: Inform open-vocabulary detector 通知开放词汇探测器
Step3: Extract bounding boxes 提取边界框
Step4: Process image streams in real time 以实时处理图像流
Output: Segmentation masks and tracking capabilities 分割掩码和跟踪能力
8.5 [8.5] 2503.16579 World Knowledge from AI Image Generation for Robot Control
[{'name': 'Jonas Krumme, Christoph Zetzsche'}]
Autonomous Systems and Robotics 自动驾驶 v2
Generative AI
Image Generation
Robot Control
Implicit Knowledge
Input: Images generated by AI 由AI生成的图像
Step1: Analyze world knowledge 分析世界知识
Step2: Apply knowledge to robot tasks 将知识应用于机器人任务
Step3: Generate contextually relevant images 生成上下文相关的图像
Output: Enhanced robot task performance 提高机器人的任务表现
8.5 [8.5] 2503.16709 QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
[{'name': 'Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu'}]
Depth Estimation 深度估计 v2
Monocular Depth Estimation
Post-Training Quantization
3D Reconstruction
Input: Monocular images 单目图像
Step1: Analyze outlier distribution 分析异常分布
Step2: Apply LogNP polishing optimization 应用LogNP平滑优化
Step3: Update weights for activation compensation 更新权重以补偿激活
Step4: Perform weight quantization with reconstruction 进行带重构的权重量化
Output: Efficient depth estimation model 高效的深度估计模型
8.5 [8.5] 2503.16742 Digitally Prototype Your Eye Tracker: Simulating Hardware Performance using 3D Synthetic Data
[{'name': 'Esther Y. H. Lin, Yimin Ding, Jogendra Kundu, Yatong An, Mohamed T. El-Haddad, Alexander Fix'}]
3D Reconstruction and Modeling 三维重建 v2
3D synthetic data
eye tracking
NeRF
hardware evaluation
augmented reality
Input: Real 3D eyes data from light dome captures
Step1: Create a hybrid mesh-NeRF representation for eye modeling
Step2: Develop an optical simulator for camera effects
Step3: Synthesize novel viewpoints and evaluate performance
Output: Enhanced predictions of eye tracker hardware performance
8.5 [8.5] 2503.16910 Salient Object Detection in Traffic Scene through the TSOD10K Dataset
[{'name': 'Yu Qiu, Yuhang Sun, Jie Mei, Lin Xiao, Jing Xu'}]
Autonomous Systems and Robotics 自动驾驶系统与机器人 v2
salient object detection
traffic scenes
TSOD10K
Input: Traffic images 交通图像
Step1: Data collection 数据收集
Step2: Dataset creation 数据集创建
Step3: Model development 模型开发
Step4: Evaluation of models 模型评估
Output: Traffic salient object detection results 交通显著性对象检测结果
8.5 [8.5] 2503.16976 GeoT: Geometry-guided Instance-dependent Transition Matrix for Semi-supervised Tooth Point Cloud Segmentation
[{'name': 'Weihao Yu, Xiaoqing Guo, Chenxin Li, Yifan Liu, Yixuan Yuan'}]
Point Cloud Processing 点云处理 v2
3D segmentation
tooth point clouds
semi-supervised learning
Input: Intra-oral scans 口腔内扫描
Step1: Introduce geometric priors 引入几何先验
Step2: Estimate instance-dependent transition matrix (IDTM) 估计实例相关转移矩阵
Step3: Perform segmentation segmentation 执行分割
Output: Segmented tooth point clouds 分割的牙齿点云
8.5 [8.5] 2503.17044 ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
[{'name': 'Chandan Yeshwanth, David Rozenberszki, Angela Dai'}]
3D Scene Understanding 三维场景理解 v2
3D captioning
3D scene understanding
Vision-Language Models
Input: 3D scene scans 3D场景扫描
Step1: Object detection and 3D understanding 对象检测与3D理解
Step2: Multi-level caption generation 多级描述生成
Output: Object- and part-level detailed captions 对象和部分级别的详细描述
8.5 [8.5] 2503.17122 R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception
[{'name': 'Jonas Mirlach, Lei Wan, Andreas Wiedholz, Hannan Ejaz Keen, Andreas Eich'}]
Autonomous Systems and Robotics 自动驾驶 v2
LiDAR
thermal imaging
autonomous driving
Vulnerable Road Users
Input: Multi-modal sensors (LiDAR, RGB, and thermal) 多模态传感器 (激光雷达、RGB和热成像)
Step1: Data collection 数据收集
Step2: Annotation and alignment 标注与对齐
Step3: Dataset release 数据集发布
Output: R-LiViT dataset R-LiViT 数据集
8.5 [8.5] 2503.17197 FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy
[{'name': 'Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori'}]
3D Reconstruction and Modeling 三维重建 v2
3D texture recovery
facial UV textures
Input: Single-view 2D images 单视角二维图像
Step1: Appearance feature extraction 外观特征提取
Step2: Structural consistency training 结构一致性训练
Step3: Cross-Assembly inference integration 交叉组装推理集成
Output: Realistic 3D facial UV textures 逼真的三维面部UV纹理
8.5 [8.5] 2503.17352 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
[{'name': 'Yihe Deng, Hritik Bansal, Fan Yin, Nanyun Peng, Wei Wang, Kai-Wei Chang'}]
Vision-Language Models 视觉语言模型 v2
vision-language models
reasoning capabilities
reinforcement learning
Input: Large vision-language models (LVLMs) 大型视觉语言模型
Step1: Distill reasoning capabilities from text models 从文本模型中提取推理能力
Step2: Generate reasoning steps using image captions 使用图像说明生成推理步骤
Step3: Utilize supervised fine-tuning (SFT) for initial training 利用监督微调进行初始训练
Step4: Apply reinforcement learning (RL) for iterative improvement 应用强化学习进行迭代改进
Output: Improved LVLM with enhanced reasoning capabilities 改进的LVLM,具有增强的推理能力
8.5 [8.5] 2503.17358 Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
[{'name': 'Jerred Chen, Ronald Clark'}]
Visual Odometry 视觉里程计 v2
camera motion estimation
visual odometry
motion blur
single image
Input: Single motion-blurred image 单张运动模糊图像
Step1: Predict motion flow field and monocular depth map 预测运动流场和单目深度图
Step2: Solve linear least squares problem 解决线性最小二乘问题
Output: Instantaneous camera velocity estimate 瞬时相机速度估计
7.5 [7.5] 2503.17142 Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models
[{'name': 'Davide Berasi, Matteo Farina, Massimiliano Mancini, Elisa Ricci, Nicola Strisciuglio'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
compositionality
visual embeddings
image generation
Input: Pre-trained VLMs input visual embeddings 预训练的视觉语言模型输入视觉嵌入
Step1: Analyze visual compositionality 分析视觉组成性
Step2: Develop Geodesically Decomposable Embeddings (GDE) 开发几何可分解嵌入
Step3: Evaluate on compositional classification and group robustness 在组合分类和组鲁棒性上评估
Output: Enhanced understanding of visual embeddings 提升视觉嵌入理解

Arxiv 2025-03-21

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.15671 CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
[{'name': 'Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesha Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
occlusion management
human modeling
Input: Single occluded image 单个被遮挡的图像
Step1: Generate occlusion-free views 生成无遮挡视图
Step2: Apply multiview diffusion model 应用多视角扩散模型
Step3: Predict 3D Gaussians 预测3D高斯
Output: Cohesive 3D representation 连续的3D表示
9.5 [9.5] 2503.15672 GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving
[{'name': 'William Ljungbergh, Adam Lilja, Adam Tonderski. Arvid Laveno Ling, Carl Lindstr\"om, Willem Verbeke, Junsheng Fu, Christoffer Petersson, Lars Hammarstrand, Michael Felsberg'}]
Autonomous Driving 自动驾驶 v2
self-supervised learning
occupancy prediction
autonomous driving
Input: Future lidar scans, camera images, and ego poses 未来激光雷达扫描、相机图像和自我姿态
Step1: Model geometric and semantic occupancy prediction 模型几何和语义占用预测
Step2: Learn unified representation 学习统一表示
Step3: Validate on autonomous driving benchmarks 在自动驾驶基准上验证
Output: Structured, generalizable representation of the environment 结构化、可泛化的环境表示
9.5 [9.5] 2503.15712 SPNeRF: Open Vocabulary 3D Neural Scene Segmentation with Superpoints
[{'name': 'Weiwen Hu, Niccol\`o Parodi, Marcus Zepp, Ingo Feldmann, Oliver Schreer, Peter Eisert'}]
3D Segmentation 3D 分割 v2
3D segmentation 3D 分割
Neural Radiance Fields 神经辐射场
geometric primitives 几何原语
CLIP
Input: 3D scenes with CLIP features 处理含有 CLIP 特征的 3D 场景
Step1: Integrate geometric primitives into NeRF 在 NeRF 中整合几何原语
Step2: Generate primitive-wise CLIP features 生成原语级 CLIP 特征
Step3: Apply primitive-based merging with affinity scoring 使用具有亲和力评分的原语合并
Output: Improved 3D segmentation results 改进的 3D 分割结果
9.5 [9.5] 2503.15742 Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes
[{'name': 'Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy Chowdhury'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
view synthesis
uncertainty quantification
Input: Single RGB image 单一RGB图像
Step1: Gaussian parameter optimization 高斯参数优化
Step2: Iterative refinement 迭代精炼
Step3: Scene rendering 场景渲染
Output: Enhanced 3D scene 改进的3D场景
9.5 [9.5] 2503.15763 OffsetOPT: Explicit Surface Reconstruction without Normals
[{'name': 'Huan Lei'}]
3D Reconstruction and Modeling 三维重建 v2
surface reconstruction
3D point clouds
neural networks
geometry processing
Input: 3D point clouds 三维点云
Step1: Train a neural network to predict surface triangles 训练神经网络以预测表面三角形
Step2: Optimize per-point offsets to improve triangle predictions 优化每个点的偏移以提高三角形预测
Output: Reconstructed explicit surfaces 还原的显式表面
9.5 [9.5] 2503.15835 BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
[{'name': 'Yiren Lu, Yunlai Zhou, Disheng Liu, Tuo Liang, Yu Yin'}]
3D Reconstruction and Modeling 三维重建 v2
dynamic scene reconstruction
3D Gaussian Splatting
motion blur
camera motion
object motion
Input: Blurry images with dynamic scenes 含动态场景的模糊图像
Step1: Camera motion deblurring 相机运动去模糊
Step2: Object motion deblurring 物体运动去模糊
Step3: Image alignment with sharp inputs 与清晰输入图像对齐
Output: High-quality dynamic scene reconstructions 高质量动态场景重建
9.5 [9.5] 2503.15855 VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
[{'name': 'Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
text-to-3D generation
multi-view images
Input: Text prompts 文本提示
Step1: Dual-stream architecture development 双流架构开发
Step2: Joint modeling of multi-view images and camera poses 多视角图像和相机姿态的联合建模
Step3: Asynchronous sampling strategy implementation 异步采样策略实施
Output: Realistic 3D Gaussian splats 真实的3D高斯点云
9.5 [9.5] 2503.15897 Learning 3D Scene Analogies with Neural Contextual Scene Maps
[{'name': 'Junho Kim, Gwangtak Bae, Eun Sun Lee, Young Min Kim'}]
3D Reconstruction and Modeling 3D重建与建模 v2
3D scene analogy
neural contextual scene maps
trajectory transfer
object placement
Input: 3D scenes with regions having spatial relationships 3D场景与空间关系区域
Step1: Extract descriptor fields from scenes 从场景中提取描述符字段
Step2: Align descriptor fields using smooth maps 使用平滑映射对齐描述符字段
Step3: Estimate dense mappings between scene regions 估计场景区域之间的密集映射
Output: Neural contextual scene maps neural contextual scene maps
9.5 [9.5] 2503.15898 Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions
[{'name': 'Boran Wen, Dingbang Huang, Zichen Zhang, Jiahong Zhou, Jianbin Deng, Jingyu Gong, Yulong Chen, Lizhuang Ma, Yong-Lu Li'}]
3D Reconstruction 三维重建 v2
3D reconstruction
human-object interaction
autonomous systems
Input: Single images 单幅图像
Step1: Data acquisition 数据获取
Step2: 3D annotation pipeline development 3D注释管道开发
Step3: Use Gaussian-HOI optimizer 高斯HOI优化器
Output: Open-vocabulary 3D HOI dataset 开放词汇3D HOI数据集
9.5 [9.5] 2503.15908 Enhancing Close-up Novel View Synthesis via Pseudo-labeling
[{'name': 'Jiatong Xia, Libo Sun, Lingqiao Liu'}]
Neural Rendering 神经渲染 v2
novel view synthesis
pseudo-labeling
Neural Radiance Fields
close-up views
Input: Training images with distant viewpoints 远处视角的训练图像
Step1: Generate virtual close-up viewpoints 生成虚拟近距离视角
Step2: Create wrapped images from original training images 根据原始训练图像创建包装图像
Step3: Evaluate consistency and occlusion for pseudo-training data 评估伪训练数据的一致性和遮挡
Step4: Train radiance fields with the pseudo-training data 使用伪训练数据训练辐射场模型
Output: Enhanced rendering of close-up views 改进的近距离视角渲染
9.5 [9.5] 2503.15917 Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras
[{'name': 'Beilei Cui, Long Bai, Mobarakol Islam, An Wang, Zhiqi Ma, Yiming Huang, Feng Li, Zhen Chen, Zhongliang Jiang, Nassir Navab, Hongliang Ren'}]
3D Reconstruction 三维重建 v2
3D scene reconstruction
self-supervised learning
depth estimation
endoscopic surgery
Input: Surgical videos from any cameras 从任意相机获取的手术视频
Step1: Efficient adaptation of foundation models 基础模型的高效适应
Step2: Simultaneous estimation of depth maps, poses, and camera parameters 同时估计深度图、姿态和相机参数
Step3: 3D scene reconstruction pipeline using estimated parameters 使用估计的参数进行三维场景重建
Output: Optimized 3D scene reconstruction 优化的三维场景重建
9.5 [9.5] 2503.15975 Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation
[{'name': 'Kendong Liu, Zhiyu Zhu, Hui Liu, Junhui Hou'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
image-to-3D generation
diffusion models
Input: Single images 单张图像
Step1: Edge consistency-based refinement 边缘一致性基于的改进
Step2: Score function regularization 分数函数正则化
Step3: Adversarial augmentation 对抗性增强
Output: High-quality 3D models 高质量三维模型
9.5 [9.5] 2503.15997 Automating 3D Dataset Generation with Neural Radiance Fields
[{'name': 'P. Schulz, T. Hempel, A. Al-Hamadi'}]
3D Reconstruction and Modeling 三维重建 v2
3D dataset generation
neural radiance fields
pose estimation
Input: 2D images of target objects 目标对象的2D图像
Step1: 3D model creation 3D模型创建
Step2: Dataset generation 数据集生成
Output: Annotated 3D datasets 带注释的3D数据集
9.5 [9.5] 2503.16263 From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction
[{'name': "Ayberk Acar, Mariana Smith, Lidia Al-Zogbi, Tanner Watts, Fangjie Li, Hao Li, Nural Yilmaz, Paul Maria Scheikl, Jesse F. d'Almeida, Susheela Sharma, Lauren Branscombe, Tayfun Efe Ertop, Robert J. Webster III, Ipek Oguz, Alan Kuntz, Axel Krieger, Jie Ying Wu"}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
monocular vision
tumor resection
structure from motion
Input: RGB images RGB图像
Step1: Data integration 数据集成
Step2: Algorithm evaluation 算法评估
Step3: Segmentation generation 分割生成
Output: Segmented point clouds with 3D reconstruction 带有三维重建的分割点云
9.5 [9.5] 2503.16282 Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
[{'name': 'Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, Serge Belongie'}]
3D point cloud segmentation 点云分割 v2
3D point cloud segmentation
Vision-Language Models
few-shot learning
Input: 3D point cloud data 3D点云数据
Step1: Pseudo-label selection 伪标签选择
Step2: Adaptive infilling strategy 自适应填充策略
Step3: Base mix strategy 基础混合策略
Output: Enhanced segmentation model 改进的分割模型
9.5 [9.5] 2503.16302 Unleashing Vecset Diffusion Model for Fast Shape Generation
[{'name': 'Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue'}]
3D Generation 三维生成 v2
3D shape generation 3D形状生成
diffusion models 扩散模型
VAE Variational Autoencoder 变分自编码器
Input: 3D shape data 3D形状数据
Step1: Analyze diffusion sampling 分析扩散采样
Step2: Implement FlashVDM framework 实现FlashVDM框架
Step3: Optimize VAE decoding 优化VAE解码
Output: High-speed 3D shape generation 高速三维形状生成
9.5 [9.5] 2503.16318 Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction
[{'name': 'Edgar Sucar, Zihang Lai, Eldar Insafutdinov, Andrea Vedaldi'}]
3D Reconstruction and Modeling 三维重建 v2
Dynamic Point Maps
3D Reconstruction
Video Depth Prediction
Input: Pair of images 图像对
Step1: Define point maps 定义点图
Step2: Predict dynamic point maps 预测动态点图
Step3: Evaluate across benchmarks 在基准上评估
Output: Enhanced dynamic reconstruction 改进的动态重建
9.5 [9.5] 2503.16338 Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images
[{'name': 'Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan'}]
3D Reconstruction and Modeling 三维重建 v2
Gaussian Graph Network
multi-view images
3D Gaussian Splatting
efficient representation
novel view synthesis
Input: Multi-view images 多视角图像
Step1: Construct Gaussian Graphs 建立高斯图
Step2: Message passing at Gaussian level 高斯级别的消息传递
Step3: Gaussian pooling aggregation 高斯池化聚合
Output: Efficient Gaussian representations 高效的高斯表示
9.5 [9.5] 2503.16399 SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
[{'name': 'Chen Chen, Zhirui Wang, Taowei Sheng, Yi Jiang, Yundu Li, Peirui Cheng, Luning Zhang, Kaiqiang Chen, Yanfeng Hu, Xue Yang, Xian Sun'}]
3D Reconstruction and Modeling 三维重建 v2
3D occupancy prediction
satellite imagery
autonomous driving
Input: Historical satellite imagery and street-view images 历史卫星图像与街道视图图像
Step1: Data integration with GPS & IMU data 数据集成与 GPS 和 IMU 数据
Step2: Implement Dynamic-Decoupling Fusion for inconsistencies 进行动态解耦融合以解决不一致问题
Step3: Use 3D-Proj Guidance for feature extraction 使用 3D 投影引导进行特征提取
Step4: Apply Uniform Sampling Alignment for sampling adjustments 使用均匀采样对齐进行采样调整
Output: Enhanced 3D occupancy prediction model 输出: 改进的 3D 占用预测模型
9.5 [9.5] 2503.16412 DreamTexture: Shape from Virtual Texture with Analysis by Augmentation
[{'name': 'Ananta R. Bhattarai, Xingzhe He, Alla Sheffer, Helge Rhodin'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
monocular depth
texture alignment
Input: Monocular images 单眼图像
Step1: Texture alignment 纹理对齐
Step2: Depth reconstruction 深度重建
Step3: Texture optimization 纹理优化
Output: 3D object representation 3D对象表示
9.5 [9.5] 2503.16413 M3: 3D-Spatial MultiModal Memory
[{'name': 'Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng, Jianglong Ye, Sifei Liu, Xiaolong Wang'}]
3D Spatial Memory 3D空间记忆 v2
3D Gaussian Splatting
multimodal memory
autonomous systems
Input: Scene video clips 场景视频片段
Step 1: Implement Gaussian splatting 技术实现高斯点云
Step 2: Integrate features from foundation models 集成基础模型特征
Step 3: Optimize memory structure 优化记忆结构
Output: Compressed multimodal memory compressed multi-modal memory
9.5 [9.5] 2503.16429 Sonata: Self-Supervised Learning of Reliable Point Representations
[{'name': 'Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian Straub'}]
3D Reconstruction and Modeling 三维重建 v2
3D self-supervised learning
point cloud
representation quality
Input: Point clouds 点云
Step1: Identify geometric shortcuts 识别几何捷径
Step2: Apply self-supervised learning techniques 应用自监督学习技术
Step3: Enhance representation quality 提升表示质量
Output: Reliable point representations 可靠的点表示
9.2 [9.2] 2503.15667 DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
[{'name': 'Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li'}]
Image Generation 图像生成 v2
360-degree synthesis
human head generation
neural rendering
Input: Single-view portrait images 单视角肖像图像
Step1: Generate back-of-head details using ControlNet 生成后脑勺细节
Step2: Dual appearance module ensures consistency 采用双重外观模块确保一致性
Step3: Train on continuous view sequences 训练于连续视图序列
Output: Generate 360-degree consistent head views 生成360度一致的头部视图
9.0 [9.0] 2503.15666 Toward Scalable, Flexible Scene Flow for Point Clouds
[{'name': 'Kyle Vedder'}]
3D Reconstruction and Modeling 3D重建与建模 v2
scene flow
point clouds
3D motion estimation
scalability
Input: Temporally successive point cloud observations 时间上连续的点云观测
Step1: Contextualize scene flow and prior methods 上下文化场景流及其先前方法
Step2: Build scalable scene flow estimators 构建可扩展的场景流估计器
Step3: Introduce a benchmark for estimate quality 引入估计质量基准
Step4: Develop an unsupervised scene flow estimator 开发无监督场景流估计器
Output: Enhanced scene flow estimations 改进的场景流估计
9.0 [9.0] 2503.15877 Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
[{'name': 'Tiange Xiang, Kai Li, Chengjiang Long, Christian H\"ane, Peihong Guo, Scott Delp, Ehsan Adeli, Li Fei-Fei'}]
3D Generation 三维生成 v2
3D generation
diffusion models
Gaussian fitting
Input: Pre-trained 2D diffusion models 预训练的2D扩散模型
Step1: Create Gaussian Atlas from 3D objects 从3D对象创建高斯图
Step2: Fine-tune 2D models for 3D output 对2D模型进行微调以生成3D输出
Output: Generated 3D Gaussian structures 生成的3D高斯结构
9.0 [9.0] 2503.16422 1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
[{'name': 'Yuheng Yuan, Qiuhong Shen, Xingyi Yang, Xinchao Wang'}]
Dynamic Scene Rendering 动态场景渲染 v2
4D Gaussian Splatting
dynamic scene reconstruction
real-time rendering
Input: Dynamic scene data 动态场景数据
Step1: Analyze temporal redundancy 分析时间冗余
Step2: Implement 4DGS-1K framework 实施4DGS-1K框架
Step3: Prune short-lifespan Gaussians 修剪短暂生命周期的高斯
Step4: Filter inactive Gaussians 过滤非活动高斯
Output: Optimized scene representation 优化的场景表示
8.5 [8.5] 2503.15676 High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
[{'name': "C\'edric Vincent, Taehyoung Kim, Henri Mee{\ss}"}]
Autonomous Systems and Robotics 自动驾驶机器人 v2
video semantic segmentation
autonomous systems
temporal consistency
Input: Aerial video frames 航空视频帧
Step1: Semantic segmentation using image model 使用图像模型进行语义分割
Step2: Temporal prediction propagation 时间预测传播
Step3: Knowledge distillation for semi-supervised training 半监督训练的知识蒸馏
Output: Consistent and accurate segmentation predictions 一致和准确的分割预测
8.5 [8.5] 2503.15778 AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models
[{'name': 'Boshra Khalili, Andrew W. Smyth'}]
Autonomous Systems and Robotics 自主系统与机器人 v2
autonomous driving
question answering
vision-language models
Input: Driving QA datasets 驾驶问答数据集
Step1: Data integration 数据集成
Step2: MCQ conversion methodology MCQ转换方法
Step3: Evaluation on public datasets 在公共数据集上评估
Output: Standardized evaluation framework 标准化评估框架
8.5 [8.5] 2503.15818 Computation-Efficient and Recognition-Friendly 3D Point Cloud Privacy Protection
[{'name': 'Haotian Ma, Lin Gu, Siyi Wu, Yingying Zhu'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud
privacy protection
flow-based generative model
Input: 3D point cloud data 3D点云数据
Step1: Define the 3D point cloud privacy problem 定义3D点云隐私问题
Step2: Implement the PointFlowGMM framework 实现PointFlowGMM框架
Step3: Project point cloud into latent Gaussian mixture space 将点云投影到潜在高斯混合空间
Step4: Apply rotation for privacy protection 应用旋转以保护隐私
Output: Encrypted 3D point clouds with preserved classification capabilities 输出: 具有保留分类能力的加密3D点云
8.5 [8.5] 2503.15875 MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving
[{'name': 'Haiguang Wang, Daqi Liu, Hongwei Xie, Haisong Liu, Enhui Ma, Kaicheng Yu, Limin Wang, Bing Wang'}]
Video Generation 视频生成 v2
video generation
autonomous driving
world models
Input: Multi-view video data 多视角视频数据
Step1: Generate high-fidelity long videos 生成高保真长时间视频
Step2: Stabilize video generation 稳定视频生成
Step3: Correct distortion of dynamic objects 修正动态物体的失真
Output: Long-duration coherent videos 长时段的一致视频
8.5 [8.5] 2503.15905 Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
[{'name': 'Jiyuan Wang, Chunyu Lin, Cheng Guan, Lang Nie, Jing He, Haodong Li, Kang Liao, Yao Zhao'}]
Depth Estimation 深度估计 v2
Depth Estimation 深度估计
Self-supervised Learning 自监督学习
Stable Diffusion 稳定扩散
Input: Monocular images 单目图像
Step1: Hybrid image reconstruction construction 混合图像重建
Step2: Scale-Shift GRU development 比例-偏移GRU开发
Step3: Self-supervised depth estimation self-supervised depth estimation 自监督深度估计
Output: Accurate depth maps 精确的深度图
8.5 [8.5] 2503.15910 No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
[{'name': 'Junsung Park, Hwijeong Lee, Inha Kang, Hyunjung Shim'}]
Autonomous Systems and Robotics 自动驾驶 v2
LiDAR semantic segmentation
autonomous driving
adverse weather
Input: LiDAR point cloud data
Step1: Identify performance gaps in existing models
Step2: Develop methods to bind point features to superclasses
Step3: Define local regions for cleaning data
Output: Improved predictions for 'things' categories
8.5 [8.5] 2503.16000 SenseExpo: Efficient Autonomous Exploration with Prediction Information from Lightweight Neural Networks
[{'name': 'Haojia Gao, Haohua Que, Hoiian Au, Weihao Shan, Mingkai Liu, Yusen Qin, Lei Mu, Rong Zhao, Xinghua Yang, Qi Wei, Fei Qiao'}]
Autonomous Systems and Robotics 自动驾驶 v2
autonomous exploration
prediction network
Generative Adversarial Networks (GANs)
Robotics
efficient frameworks
Input: Partial observations captured by the robot's onboard sensors 通过机器人的传感器捕获的部分观测
Step1: Local map prediction 基于局部地图的预测
Step2: Model integration with GANs, Transformers, and FFC 用GAN、Transformer和FFC集成模型
Step3: Efficiency evaluation 评估效率
Output: Efficient autonomous exploration framework 高效的自主探索框架
8.5 [8.5] 2503.16125 Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection
[{'name': 'Jiangyi Wang, Na Zhao'}]
3D Object Detection 在室内环境中的应用 v2
3D object detection
active learning
indoor environments
uncertainty
diversity
Input: Indoor 3D object data 室内3D物体数据
Step1: Sample uncertainty assessment 样本不确定性评估
Step2: Diversity optimization 多样性优化
Step3: Active sample selection 主动样本选择
Output: Annotated samples for indoor 3D detection 为室内3D检测注释的样本
8.5 [8.5] 2503.16289 SceneMI: Motion In-betweening for Modeling Human-Scene Interactions
[{'name': 'Inwoo Hwang, Bing Zhou, Young Min Kim, Jian Wang, Chuan Guo'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
motion in-betweening
human-scene interactions
generative modeling
Input: Noisy keyframes and scenes 噪声关键帧和场景
Step1: Scene encoding through dual descriptors 场景编码通过双重描述符
Step2: Dual scene context processing 双重场景上下文处理
Step3: Denoising and keyframe interpolation 去噪和关键帧插值
Output: Smooth motion transitions and scene reconstructions 平滑的运动过渡和场景重建
8.5 [8.5] 2503.16378 Panoptic-CUDAL Technical Report: Rural Australia Point Cloud Dataset in Rainy Conditions
[{'name': 'Tzu-Yun Tseng, Alexey Nekrasov, Malcolm Burdorf, Bastian Leibe, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}]
Autonomous Driving 自动驾驶 v2
LiDAR
autonomous driving
dataset
panoptic segmentation
Input: Synchronized sensor data 同步传感器数据
Step1: Data collection 数据收集
Step2: Annotation of LiDAR and image data LiDAR 和图像数据的标注
Step3: Model evaluation and analysis 模型评估与分析
Output: Baseline results for segmentation methods 语义分割方法的基线结果
8.5 [8.5] 2503.16396 SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
[{'name': 'Chun-Han Yao, Yiming Xie, Vikram Voleti, Huaizu Jiang, Varun Jampani'}]
Image and Video Generation 图像生成与视频生成 v2
3D asset generation
multi-view video
4D generation
video diffusion model
Input: Monocular video 单目视频
Step1: Network architecture modification 网络架构修改
Step2: Data curation 数据整理
Step3: Progressive training strategy 逐步训练策略
Step4: 4D optimization 4D优化
Output: High-quality multi-view videos 高质量多视角视频
8.5 [8.5] 2503.16420 SynCity: Training-Free Generation of 3D Worlds
[{'name': 'Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea Vedaldi'}]
3D Generation 三维生成 v2
3D generation
textual descriptions
tile-based generation
Input: Textual descriptions 文本描述
Step1: Generate 3D tiles 生成3D瓦片
Step2: Stitch tiles together 拼接瓦片
Step3: Ensure geometric consistency 确保几何一致性
Output: Large and immersive 3D worlds 输出: 大型且沉浸式的3D世界
7.5 [7.5] 2503.15886 Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance
[{'name': 'Hui Liu, Wenya Wang, Kecheng Chen, Jie Liu, Yibing Liu, Tiexin Qin, Peisong He, Xinghao Jiang, Haoliang Li'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Zero-Shot Generalization
Vision Language Models
Concept-guided Reasoning
Input: Zero-shot image recognition data 零样本图像识别数据
Step1: Concept modeling 概念建模
Step2: Importance sampling algorithm 重要性采样算法
Step3: Generate discriminative concepts 生成可区分的概念
Output: Enhanced zero-shot recognition results 改进的零样本识别结果
6.5 [6.5] 2503.16365 JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse
[{'name': 'Muyao Li, Zihao Wang, Kaichen He, Xiaojian Ma, Yitao Liang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision Language Models
decision-making
Input: Vision Language models 视觉语言模型
Step1: Visual Language Post-Training 视觉语言后训练
Step2: Action decision-making 行为决策
Output: Enhanced decision-making capabilities 改进的决策能力
6.5 [6.5] 2503.16397 Scale-wise Distillation of Diffusion Models
[{'name': 'Nikita Starodubcev, Denis Kuznedelev, Artem Babenko, Dmitry Baranchuk'}]
Image Generation 图像生成 v2
Diffusion Models
Text-to-Image Generation
Generative Models
Input: Low-resolution data 低分辨率数据
Step1: Scale-wise generation 按比例生成
Step2: Distribution matching 分布匹配
Step3: Resolution upscaling 分辨率上升
Output: High-quality generated images 高质量生成图像

Arxiv 2025-03-19

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.13587 Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception
[{'name': 'Dingkang Liang, Dingyuan Zhang, Xin Zhou, Sifan Tu, Tianrui Feng, Xiaofan Li, Yumeng Zhang, Mingyang Du, Xiao Tan, Xiang Bai'}]
Unified World Model 统一世界模型 v2
driving world model
future prediction
depth estimation
autonomous driving
Input: Current image 当前图像
Step1: Dual-Latent Sharing scheme 双潜在共享方案
Step2: Multi-scale Latent Interaction mechanism 多尺度潜在交互机制
Step3: Predict future image-depth pairs 预测未来图像-深度对
Output: Unified future predictions 统一的未来预测
9.5 [9.5] 2503.13710 Improving Geometric Consistency for 360-Degree Neural Radiance Fields in Indoor Scenarios
[{'name': 'Iryna Repinetska, Anna Hilsmann, Peter Eisert'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D reconstruction
depth estimation
Input: 360-degree indoor images 360度室内图像
Step1: Dense depth priors calculation 密集深度先验计算
Step2: Novel depth loss function formulation 新的深度损失函数设计
Step3: Patch-based depth regularization implementation 贴片深度正则化实施
Output: Improved rendering quality 提高渲染质量
9.5 [9.5] 2503.13721 SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint
[{'name': 'Zhenlong Yuan, Zhidong Yang, Yujun Cai, Kuangxin Wu, Mufan Liu, Dapeng Zhang, Hao Jiang, Zhaoxin Li, Zhaoqi Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction 三维重建
Multi-View Stereo 多视角立体
occlusion-aware reconstruction 遮挡感知重建
Input: Multi-view images 多视角图像
Step1: Panoptic segmentation for depth edge guidance 采用全景分割作为深度边缘指导
Step2: Multi-trajectory diffusion strategy to align patches with depth edges 多轨迹扩散策略以确保补丁与深度边缘对齐
Step3: Combine sparse points and monocular depth map to restore reliable depth map 结合稀疏点和单目深度图以恢复可靠的深度图
Output: Accurate 3D reconstruction of the scene or object 输出: 场景或对象的准确三维重建
9.5 [9.5] 2503.13739 Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
[{'name': 'Keqi Chen, Vinkle Srivastav, Didier Mutter, Nicolas Padoy'}]
Multi-view Stereo 多视角立体 v2
multi-view person association
self-supervised learning
geometric constraints
Input: Multi-view RGB images 多视角RGB图像
Step1: Encoder-decoder model encoding 准编码解码模型编码
Step2: Self-supervised learning framework训练自监督学习框架
Step3: Synchronization task for image pairs image pairs image pair 的同步任务
Output: Geometric feature encoding 几何特征编码
9.5 [9.5] 2503.13743 MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models
[{'name': 'Johannes Meier, Louis Inchingolo, Oussema Dhaouadi, Yan Xia, Jacques Kaiser, Daniel Cremers'}]
3D Object Detection 3D目标检测 v2
Monocular 3D detection 单目3D检测
Depth estimation 深度估计
Domain adaptation 域适应
Input: Monocular RGB images 单目RGB图像
Step1: Generalized Depth Enhancement (GDE) module development 开发广义深度增强(GDE)模块
Step2: Pseudo Label Scoring (PLS) module design 设计伪标签评分(PLS)模块
Step3: Extensive experiments on multiple benchmarks 在多个基准上进行广泛实验
Output: Improved monocular 3D detection performance 改进的单目3D检测性能
9.5 [9.5] 2503.13816 MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments
[{'name': 'Zhixuan Liu, Haokun Zhu, Rui Chen, Jonathan Francis, Soonmin Hwang, Ji Zhang, Jean Oh'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
privacy-preserving
multi-view images
depth images
Input: Depth images only 仅深度图像
Step1: Multi-view overlapped scene alignment 多视角重叠场景对齐
Step2: Inference-time optimization 推断时优化
Step3: Generation of consistent RGB images 生成一致的RGB图像
Output: Privacy-preserving digital twins 保持隐私的数字双胞胎
9.5 [9.5] 2503.13861 RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving
[{'name': 'Yujin Wang, Quanfeng Liu, Zhengxin Jiang, Tianyi Wang, Junfeng Jiao, Hongqing Chu, Bingzhao Gao, Hong Chen'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
vision-language models
autonomous driving
decision-making
spatial perception
Input: Vision-language models for autonomous driving 视觉语言模型的自主驾驶
Step1: Embedding flow for scene encoding 场景编码的嵌入流
Step2: Retrieval flow to fetch relevant scenes 检索流获取相关场景
Step3: Generating flow to produce meta-actions 生成流生成元动作
Output: Decision-making enhancements for autonomous driving decisions 提升自主驾驶决策的决策能力
9.5 [9.5] 2503.13914 PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
[{'name': 'Barza Nisar, Steven L. Waslander'}]
3D Reconstruction and Modeling 3D重建与建模 v2
3D semantic segmentation
LiDAR point clouds
self-supervised learning
Input: LiDAR point clouds LiDAR点云
Step1: Define bounding box regression as pretext task 定义边界框回归作为预训练任务
Step2: Incorporate LiDAR beam pattern augmentation 融入激光雷达束模式增强
Step3: Train model using contrastive learning 采用对比学习训练模型
Output: Object pose and size-aware features 输出:物体姿态和尺寸感知特征
9.5 [9.5] 2503.13948 Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model
[{'name': 'Mufan Liu, Qi Yang, He Huang, Wenjie Huang, Zhenlong Yuan, Zhu Li, Yiling Xu'}]
3D Gaussian Splatting 3D高斯点云 v2
4D Gaussian Splatting
3D Reconstruction
Novel View Synthesis
Input: Temporal deformation primitives 时间变形原语
Step1: Spatio-temporal significance pruning 空间-时间显著性修剪
Step2: Deep context model integration 深度上下文模型集成
Output: Compressed lightweight dynamic 3DGS 压缩轻量级动态3DGS
9.5 [9.5] 2503.14002 MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling
[{'name': 'Damian Boborzi, Phillip Mueller, Jonas Emrich, Dominik Schmid, Sebastian Mueller, Lars Mikelsons'}]
3D Generation 三维生成 v2
3D reconstruction
3D dataset
generative modeling
vehicle models
Input: 3D models from Objaverse-XL 来自Objaverse-XL的3D模型
Step1: Create a manually labeled subset 创建手动标记子集
Step2: Train a quality classifier 训练质量分类器
Step3: Apply automated filtering 应用自动化过滤
Output: High-quality filtered 3D vehicle dataset 输出:高质量过滤的3D车辆数据集
9.5 [9.5] 2503.14029 Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
[{'name': 'Runsong Zhu, Shi Qiu, Zhengzhe Liu, Ka-Hei Hui, Qianyi Wu, Pheng-Ann Heng, Chi-Wing Fu'}]
3D Reconstruction and Modeling 三维重建 v2
3D segmentation
Gaussian splatting
computer vision
Input: Multi-view 2D instance segmentation 2D实例分割
Step1: Gaussian-level feature augmentation 高斯级特征增强
Step2: Object-level codebook learning 对象级别的词汇表学习
Step3: Association learning 关联学习
Step4: Noisy label filtering 噪声标签过滤
Output: Accurate 3D scene segmentation 准确的3D场景分割
9.5 [9.5] 2503.14198 RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images
[{'name': 'Junjin Xiao, Qing Zhang, Yonewei Nie, Lei Zhu, Wei-Shi Zheng'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
novel view synthesis
Gaussian splatting
autonomous driving
Input: Sparse multi-view images 稀疏多视角图像
Step1: Lift SMPL vertices to 3D points 提升SMPL顶点到3D点
Step2: Predict image-aligned 3D prior points 预测与图像对齐的3D先验点
Step3: Regress coarse and fine Gaussian parameters 回归粗糙和细粒度的高斯参数
Output: High-fidelity novel views 高保真新视图
9.5 [9.5] 2503.14219 Segmentation-Guided Neural Radiance Fields for Novel Street View Synthesis
[{'name': 'Yizhou Li, Yusuke Monno, Masatoshi Okutomi, Yuuichi Tanaka, Seiichi Kataoka, Teruaki Kosiba'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
3D reconstruction
novel view synthesis
outdoor scenes
Input: Monocular video clips captured by a video recorder mounted on a car.
Step1: Segmentation mask generation using Grounded SAM.
Step2:处理 transient objects by excluding them from training.
Step3: Modeling the sky with a specialized representation.
Step4: Regularizing the ground plane to conform to planar geometry.
Step5: Adapting to inconsistent lighting through appearance embeddings.
Output: Improved novel view synthesis quality with fewer artifacts.
9.5 [9.5] 2503.14274 Improving Adaptive Density Control for 3D Gaussian Splatting
[{'name': 'Glenn Grubert, Florian Barthel, Anna Hilsmann, Peter Eisert'}]
3D Gaussian Splatting 三维高斯点云 v2
3D reconstruction
Gaussian Splatting
novel view synthesis
Input: Multi-view images 多视角图像
Step1: Adaptive density control for Gaussian management 自适应密度控制以管理高斯
Step2: Implement exponential gradient thresholding 实施指数梯度阈值
Step3: Calculate corrected scene extent 计算纠正后的场景范围
Step4: Execute significance-aware pruning 执行重要性感知修剪
Output: Enhanced rendering quality 改进的渲染质量
9.5 [9.5] 2503.14346 3D Densification for Multi-Map Monocular VSLAM in Endoscopy
[{'name': "X. Anad\'on, Javier Rodr\'iguez-Puigvert, J. M. M. Montiel"}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
endoscopy
visual SLAM
CudaSIFT
depth estimation
Input: Monocular endoscopic sequences 单目内窥镜序列
Step1: Remove outliers 去除异常值
Step2: Densify maps 加密地图
Step3: Align predictions and submaps 对齐预测和子地图
Output: Reliable densified 3D maps 可靠的加密3D地图
9.5 [9.5] 2503.14445 Bolt3D: Generating 3D Scenes in Seconds
[{'name': 'Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, Philipp Henzler'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene generation
latent diffusion model
multiview images
Input: One or multiple images 任选入图像
Step1: Create large-scale multiview-consistent dataset 创建大规模多视角一致性数据集
Step2: Train latent diffusion model 训练潜在扩散模型
Step3: Generate 3D scene representation 生成三维场景表示
Output: Fast 3D scene representation generation 快速生成三维场景表示
9.5 [9.5] 2503.14463 SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
[{'name': 'Yucheng Mao, Boyang Wang, Nilesh Kulkarni, Jeong Joon Park'}]
3D Reconstruction 三维重建 v2
3D reconstruction
image restoration
multi-view
diffusion model
Input: Multi-view images 多视角图像
Step1: Jointly denoise multiple photographs 联合去噪多个影像
Step2: Implement a multi-view diffusion model 实施多视角扩散模型
Step3: Maintain 3D consistency 维护三维一致性
Output: Restored images with improved quality 修复后图像,质量提升
9.5 [9.5] 2503.14483 Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
[{'name': 'Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao'}]
3D Reconstruction 三维重建 v2
3D Reconstruction 三维重建
Monocular Depth Estimation 单目深度估计
SfM-guided Reconstruction SfM引导重建
Multi-view Geometry 多视角几何
Input: Multi-view images 多视角图像
Step1: Recover the SfM point cloud 恢复SfM点云
Step2: Inject SfM information into the diffusion model 将SfM信息注入扩散模型
Step3: Predict depth maps 预测深度图
Step4: Fuse depth maps for 3D reconstruction 进行深度图融合以实现3D重建
Output: High-quality 3D models 高质量的3D模型
9.2 [9.2] 2503.13869 Robust3D-CIL: Robust Class-Incremental Learning for 3D Perception
[{'name': 'Jinge Ma, Jiangpeng He, Fengqing Zhu'}]
3D Perception 3D感知 v2
3D perception
class-incremental learning
autonomous driving
Input: 3D point cloud data 3D点云数据
Step1: Develop a robust 3D point cloud class-incremental learning framework 设计一个稳健的3D点云类增量学习框架
Step2: Implement an exemplar selection strategy based on Farthest Point Sampling 实施基于最远点采样的样本选择策略
Step3: Introduce a point cloud downsampling-based replay method 引入基于点云降采样的重放方法
Output: Improved adaptability and robustness in 3D perception models 输出: 提高3D感知模型的适应性和稳健性
9.2 [9.2] 2503.13952 SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model
[{'name': 'Xinqing Li, Ruiqi Song, Qingyu Xie, Ye Wu, Nanxin Zeng, Yunfeng Ai'}]
Autonomous Driving 自动驾驶 v2
simulator-conditioned scene generation
autonomous driving
data generation
Input: Simulation conditions based on real-world data 真实数据的模拟条件
Step1: Scene simulation for data generation 场景模拟以生成数据
Step2: Label alignment with real-world conditions 标签与真实世界条件的对齐
Step3: Benchmark evaluation for generated data 生成数据的基准评估
Output: Large-scale diverse datasets for autonomous driving applications 大规模多样化数据集,用于自动驾驶应用
9.2 [9.2] 2503.13982 A-SCoRe: Attention-based Scene Coordinate Regression for wide-ranging scenarios
[{'name': 'Huy-Hoang Bui, Bach-Thuan Bui, Quang-Vinh Tran, Yasuyuki Fujii, Joo-Ho Lee'}]
Visual Localization 视觉定位 v2
scene coordinate regression
visual localization
robotics
Input: Images from multiple modalities 多种模式的图像
Step1: Descriptor extraction 描述符提取
Step2: Attention-based scene coordinate regression 基于注意力的场景坐标回归
Step3: Camera pose estimation 相机姿态估计
Output: Estimated camera poses 估计的相机姿态
9.2 [9.2] 2503.14493 State Space Model Meets Transformer: A New Paradigm for 3D Object Detection
[{'name': 'Chuxin Wang, Wenfei Yang, Xiang Liu, Tianzhu Zhang'}]
3D Object Detection 3D目标检测 v2
3D object detection
state space model
transformer
Input: 3D point clouds 3D点云
Step1: Model state-dependent parameters 模型状态依赖参数
Step2: Implement interaction mechanisms 实现互动机制
Step3: Conduct experiments on datasets 在数据集上进行实验
Output: Enhanced object detection performance 改进的目标检测性能
9.2 [9.2] 2503.14498 Tracking Meets Large Multimodal Models for Driving Scenario Understanding
[{'name': 'Ayesha Ishaq, Jean Lahoud, Fahad Shahbaz Khan, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer'}]
Autonomous Driving 自动驾驶 v2
Large Multimodal Models
Autonomous Driving
3D Spatial Understanding
Input: Tracking information and visual data 跟踪信息和视觉数据
Step1: Integrate tracking data into Large Multimodal Models (LMMs) 将跟踪数据集成到大型多模态模型中
Step2: Self-supervised pretraining of the tracking encoder 跟踪编码器的自监督预训练
Step3: Enhance perception, planning, and prediction tasks 增强感知、规划和预测任务
Output: Improved decision-making in dynamic driving environments 输出:在动态驾驶环境中改善决策
8.5 [8.5] 2503.13778 Using 3D reconstruction from image motion to predict total leaf area in dwarf tomato plants
[{'name': 'Dmitrii Usenko, David Helman, Chen Giladi'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
leaf area estimation
machine learning
precision agriculture
Input: Sequential 3D reconstructions from RGB images 从RGB图像的序列3D重建
Step1: Data integration 数据集成
Step2: 3D reconstruction algorithms development 3D重建算法开发
Step3: Leaf area estimation leaf area estimation 叶面积估计
Output: Estimated total leaf area (TLA) 估计的总叶面积(TLA)
8.5 [8.5] 2503.13792 Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
[{'name': 'Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang'}]
VLM & VLA 视觉语言模型与对齐 v2
Vision-Language Models (VLMs) 视觉语言模型
Position Bias 位置偏差
Multi-Image Reasoning 多图像推理
Input: Multi-image inputs 多图像输入
Step1: Introduce Position-wise Question Answering (PQA) 引入位置敏感问答任务
Step2: Analyze position bias 分析位置偏差
Step3: Propose SoFt Attention (SoFA) 提出SoFt Attention方法
Output: Mitigated position bias 减轻位置偏差
8.5 [8.5] 2503.13858 MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations
[{'name': 'Hongyu Ke, Jack Morris, Kentaro Oguchi, Xiaofei Cao, Yongkang Liu, Haoxin Wang, Yi Ding'}]
3D Reconstruction and Modeling 三维重建 v2
3D visual perception
autonomous driving
bird's-eye view
Input: Multi-camera images 多摄像头图像
Step1: Spatial Cross Mamba integration 空间交叉Mamba集成
Step2: Unified BEV representation generation 统一的BEV表示生成
Step3: Computational efficiency assessment 计算效率评估
Output: Enhanced BEV representation 改进的BEV表示
8.5 [8.5] 2503.13891 Where do Large Vision-Language Models Look at when Answering Questions?
[{'name': 'Xiaoying Xing, Chia-Wen Kuo, Li Fuxin, Yulei Niu, Fan Chen, Ming Li, Ying Wu, Longyin Wen, Sijie Zhu'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
Vision-Language Models 视觉语言模型
visual attention 视觉关注
multimodal tasks 多模态任务
Input: Large Vision-Language Models (LVLMs) 大型视觉语言模型
Step1: Extend heatmap visualization methods 扩展热图可视化方法
Step2: Select visually relevant tokens 选择视觉相关标记
Step3: Conduct analysis on LVLMs 进行LVLM分析
Output: Insights into visual understanding and attention regions 输出:视觉理解和注意区域的洞察
8.5 [8.5] 2503.13926 Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation
[{'name': 'Huan Ren, Wenfei Yang, Xiang Liu, Shifeng Zhang, Tianzhu Zhang'}]
3D Pose Estimation 3D姿态估计 v2
object pose estimation
spherical representations
3D reconstruction
Input: Observed object points 观察目标点
Step1: Feature extraction 特征提取
Step2: Spherical projection to HEALPix grids 将点投影到HEALPix网格
Step3: Correspondence prediction 对应关系预测
Output: Predict object pose and size 预测目标的姿态和尺寸
8.5 [8.5] 2503.13938 ChatBEV: A Visual Language Model that Understands BEV Maps
[{'name': 'Qingyao Xu, Siheng Chen, Guang Chen, Yanfeng Wang, Ya Zhang'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
BEV maps
traffic scene understanding
Vision-Language Models
autonomous driving
Input: BEV maps (Bird's-Eye View maps) BEV地图
Step1: Dataset construction using novel collection pipeline 数据集构建使用新收集管道
Step2: Fine-tune vision-language model ChatBEV on the dataset 在数据集上微调视觉语言模型ChatBEV
Step3: Implement language-driven traffic scene generation pipeline 实施语言驱动的交通场景生成管道
Output: Enhanced understanding and generation of traffic scenarios 改进的交通场景理解与生成
8.5 [8.5] 2503.13946 Is Discretization Fusion All You Need for Collaborative Perception?
[{'name': 'Kang Yang, Tianci Bu, Lantao Li, Chunxu Li, Yongcai Wang, Deying Li'}]
Autonomous Systems and Robotics 自动驾驶与机器人 v2
Collaborative perception 协作感知
3D object detection 三维物体检测
Input: Features from multi-view images 由多视角图像提取的特征
Step1: Generate anchor proposals 生成锚点提案
Step2: Select confident features 选择自信特征
Step3: Perform local-global fusion 执行局部-全局融合
Output: Enhanced object detection improvements 改进的物体检测结果
8.5 [8.5] 2503.13951 FrustumFusionNets: A Three-Dimensional Object Detection Network Based on Tractor Road Scene
[{'name': 'Lili Yang, Mengshuai Chang, Xiao Guo, Yuxin Feng, Yiwen Mei, Caicong Wu'}]
3D Object Detection 三维对象检测 v2
3D object detection 三维对象检测
frustum-based methods 棱锥法
agricultural machinery 农业机械
Input: Multi-source sensor data (LiDAR and camera) 输入: 多源传感器数据(激光雷达和相机)
Step1: Generate 2D object detection results to narrow search areas in 3D point cloud 第一步: 生成二维对象检测结果以缩小三维点云的搜索区域
Step2: Apply Gaussian mask to enhance point cloud information 第二步: 应用高斯掩模以增强点云信息
Step3: Extract features from both frustum point cloud and crop images 第三步: 从棱锥点云和作物图像中提取特征
Output: Concatenated features for 3D object detection 输出: 用于三维对象检测的连接特征
8.5 [8.5] 2503.14001 Multimodal Feature-Driven Deep Learning for the Prediction of Duck Body Dimensions and Weight
[{'name': 'Yi Xiao, Qiannan Han, Guiping Liang, Hongyan Zhang, Song Wang, Zhihao Xu, Weican Wan, Chuang Li, Guitao Jiang, Wenbo Xiao'}]
3D Reconstruction and Modeling 三维重建 v2
3D point clouds
multimodal data
deep learning
weight estimation
body dimension prediction
Input: 2D RGB images, depth images, 3D point clouds from multiple views 2D RGB图像、深度图像和来自多个视角的3D点云
Step1: Data collection and preprocessing 数据收集与预处理
Step2: Feature extraction using PointNet++ 特征提取使用PointNet++
Step3: Fusion of 2D and 3D features 2D和3D特征融合
Step4: Model training and evaluation 模型训练与评估
Output: Predicted body dimensions and weight 预测的体型尺寸和体重
8.5 [8.5] 2503.14097 SCJD: Sparse Correlation and Joint Distillation for Efficient 3D Human Pose Estimation
[{'name': 'Weihong Chen, Xuemiao Xu, Haoxin Yang, Yi Xie, Peng Xiao, Cheng Xu, Huaidong Zhang, Pheng-Ann Heng'}]
3D Reconstruction and Modeling 三维重建 v2
3D human pose estimation
knowledge distillation
Input: Multi-frame input sequences 多帧输入序列
Step1: Sparse correlation input sequence downsampling 稀疏相关输入序列下采样
Step2: Dynamic joint spatial attention distillation 动态关节空间注意力蒸馏
Step3: Temporal consistency distillation 时间一致性蒸馏
Output: Accurate 3D human pose predictions 精确的三维人体姿态预测
8.5 [8.5] 2503.14154 RBFIM: Perceptual Quality Assessment for Compressed Point Clouds Using Radial Basis Function Interpolation
[{'name': 'Zhang Chen, Shuai Wan, Siyu Ren, Fuzheng Yang, Mengting Yu, Junhui Hou'}]
Point Cloud Processing 点云处理 v2
point cloud
quality assessment
perceptual quality
compression
Input: Distorted point clouds 失真点云
Step1: Convert discrete point features to continuous feature function 将离散点特征转换为连续特征函数
Step2: Establish bijective feature sets 建立双射特征集
Step3: Evaluate perceptual quality 评估感知质量
Output: Enhanced quality assessment 改进的质量评估
8.5 [8.5] 2503.14171 Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images
[{'name': 'Simon Niedermayr, Christoph Neuhauser R\"udiger Westermann'}]
Neural Rendering 神经渲染 v2
3D Gaussian Splatting
image upscaling
novel view synthesis
Input: Low-resolution 3D Gaussian Splatting renderings 低分辨率3D高斯点云渲染
Step1: Image gradient analysis 图像梯度分析
Step2: Gradient-based bicubic spline interpolation 基于梯度的双三次样条插值
Step3: Integration into 3DGS optimization 将其集成到3DGS优化中
Output: High-resolution images with enhanced quality 高分辨率图像和增强质量
8.5 [8.5] 2503.14244 Deep Unsupervised Segmentation of Log Point Clouds
[{'name': 'Fedor Zolotarev, Tuomas Eerola, Tomi Kauppi'}]
Point Cloud Processing 点云处理 v2
point cloud segmentation
timber logs
3D reconstruction
Input: Surface point clouds 表面点云
Step1: Unsupervised segmentation 无监督分割
Step2: Geometrical property analysis 几何属性分析
Step3: Model evaluation 模型评估
Output: Accurate log surface points 准确的日志表面点
8.5 [8.5] 2503.14359 ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
[{'name': 'Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu'}]
3D Reconstruction and Modeling 三维重建 v2
immersive volumetric videos
3D reconstruction
multi-view capture
Input: Multi-view, multi-modal audio-video data 多视角, 多模态音视频数据
Step1: Data capture 进行数据捕获
Step2: Benchmarking existing methods 对现有方法进行基准测试
Step3: Developing a pipeline for reconstruction 开发重建管道
Output: Immersive volumetric videos 生成沉浸式体积视频
8.5 [8.5] 2503.14405 DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
[{'name': 'Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Pau de Jorge, Diane Larlus, Yannis Kalantidis'}]
3D Understanding 3D理解 v2
heterogeneous teacher distillation
depth estimation
3D understanding
Input: Various heterogeneous teacher models 诸多异构教师模型
Step1: Define heterogeneous teacher distillation 定义异构教师蒸馏
Step2: Explore data-sharing strategies 探索数据共享策略
Step3: Design and evaluate the projector architecture 设计并评估投影器架构
Output: Universal encoder capable of 2D and 3D tasks 能够进行2D和3D任务的通用编码器
8.5 [8.5] 2503.14489 Stable Virtual Camera: Generative View Synthesis with Diffusion Models
[{'name': 'Jensen (Jinghao), Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, Varun Jampani'}]
Multi-view and Stereo Vision 多视角和立体视觉 v2
Novel View Synthesis 新视图合成
Diffusion Models 扩散模型
3D Reconstruction 三维重建
Input: Any number of input views and target cameras 任意数量的输入视图和目标相机
Step1: Model design 模型设计
Step2: Training strategy 训练策略
Step3: Sampling method 采样方法
Output: Novel views of a scene 场景的新视图
8.5 [8.5] 2503.14492 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control
[{'name': 'NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng'}]
Image and Video Generation 图像生成 v2
world generation
diffusion models
autonomous driving
robotics
Input: Conditional multi-modal inputs (segmentation, depth, edge) 条件多模态输入(分割,深度,边缘)
Step1: Adaptive weighting of conditional inputs 自适应加权条件输入
Step2: World generation using Conditional Diffusion Model 使用条件扩散模型生成世界
Output: Real-time world simulations 实时世界模拟
8.5 [8.5] 2503.14501 Advances in 4D Generation: A Survey
[{'name': 'Qiaowei Miao, Kehan Li, Jinsheng Quan, Zhiyuan Min, Shaojie Ma, Yichao Xu, Yi Yang, Yawei Luo'}]
Image and Video Generation 图像生成与视频生成 v2
4D generation
autonomous driving
dynamic modeling
Input: 4D data representations 4D数据表示
Step1: Survey of existing technologies 现有技术的调查
Step2: Literature review 文献综述
Step3: Challenges and opportunities analysis 挑战与机遇分析
Output: Comprehensive understanding of 4D generation 4D生成的全面理解
8.0 [8.0] 2503.13652 Web Artifact Attacks Disrupt Vision Language Models
[{'name': 'Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
artifact attacks
model robustness
Input: Vision-language models (VLMs) 视觉语言模型
Step1: Identify artifact-based attacks 识别伪影攻击
Step2: Develop automated mining pipeline 开发自动化挖掘管道
Step3: Optimize attacks and evaluate effectiveness 优化攻击并评估有效性
Output: Enhanced understanding of model vulnerabilities 改进对模型脆弱性的理解
8.0 [8.0] 2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models
[{'name': 'Yuxiang Lai, Jike Zhong, Ming Li, Shitian Zhao, Xiaofeng Yang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Medical Imaging
Reinforcement Learning
Input: Medical imaging data 医学图像数据
Step1: Implement reinforcement learning framework 实施强化学习框架
Step2: Optimize reasoning paths using GRPO 优化推理路径使用GRPO
Step3: Evaluate model across different imaging modalities 评估模型在不同成像模式下的性能
Output: Enhanced generalization and trustworthiness 增强的泛化和可信性
7.5 [7.5] 2503.13966 FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks
[{'name': 'Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Longteng Guo, Zhihua Wei, Jing Liu'}]
VLM & VLA 视觉语言模型与视觉语言对齐 v2
Vision-and-Language Navigation
Large Language Models
Input: Visual input and natural language instructions 视觉输入与自然语言指令
Step1: Generate high-level navigation plan 生成高层导航计划
Step2: Validate guidance feasibility 验证指导的可行性
Step3: Execute navigation actions 执行导航动作
Output: Target location reached 到达目标位置
7.5 [7.5] 2503.14161 CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
[{'name': 'Yiqi Zhu, Ziyue Wang, Can Zhang, Peng Li, Yang Liu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Continuous Space Perception
Spatial Reasoning
Input: Multi-image sequences 多图像序列
Step1: Define continuous space perception 定义连续空间感知
Step2: Develop benchmark tasks 开发基准任务
Step3: Evaluate models across tasks 在任务中评估模型
Output: Performance metrics 绩效指标
7.5 [7.5] 2503.14277 Towards synthetic generation of realistic wooden logs
[{'name': 'Fedor Zolotarev, Borek Reich, Tuomas Eerola, Tomi Kauppi, Pavel Zemcik'}]
3D Generation 三维生成 v2
3D representation
synthetic generation
wooden logs
Input: Specifications of wooden logs 木材参数
Step1: Internal knot generation 内部结的生成
Step2: Centerline generation 中心线的生成
Step3: Surface generation 表面生成
Output: Realistic 3D models of wooden logs 逼真的木材三维模型
7.5 [7.5] 2503.14402 Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance
[{'name': 'Lisha Li, Jingwen Hou, Weide Liu, Yuming Fang, Jiebin Yan'}]
Image Generation 图像生成 v2
Facial Aesthetics Enhancement
3D structure guidance
Diffusion model
Facial beautification
Input: 2D facial images 2D面部图像
Step1: Nearest Neighbor Face Searching (NNFS) module 寻找最近邻面孔
Step2: Facial Guidance Extraction (FGE) module 提取面部引导
Step3: Face Beautification (FB) module 面部美化
Output: Enhanced facial images 改进的面部图像
7.0 [7.0] 2503.14075 Growing a Twig to Accelerate Large Vision-Language Models
[{'name': 'Zhenwei Shao, Mingyang Wang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Ning Mao, Wei Chen, Jun Yu'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models 视觉语言模型
VLM acceleration VLM加速
Token pruning 标记修剪
Input: Base VLM architecture 基础视觉语言模型架构
Step1: Twig-guided token pruning twig引导的标记修剪
Step2: Self-speculative decoding 自我推测解码
Output: Accelerated VLM performance 加速的视觉语言模型性能

Arxiv 2025-03-12

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.07739 SIRE: SE(3) Intrinsic Rigidity Embeddings
[{'name': 'Cameron Smith, Basile Van Hoorick, Vitor Guizilini, Yue Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
dynamic scene reconstruction
self-supervised learning
Input: Videos from casual scenes 来自休闲场景的视频
Step1: Estimate scene rigidity and geometry 估计场景刚性与几何
Step2: Use a least-squares solver to lift 2D trajectories into SE(3) tracks 使用最小二乘解算器将2D轨迹提升至SE(3)轨迹
Step3: Re-project back to 2D and compare against original trajectories 重新投影回2D并与原始轨迹比较
Output: Rigid scene structure and embeddings 刚性场景结构及嵌入
9.5 [9.5] 2503.07743 SANDRO: a Robust Solver with a Splitting Strategy for Point Cloud Registration
[{'name': 'Michael Adlerstein, Jo\~ao Carlos Virgolino Soares, Angelo Bratta, Claudio Semini'}]
3D Reconstruction and Modeling 三维重建 v2
point cloud registration
3D modeling
Input: Point cloud data 点云数据
Step1: Initial outlier detection 初始异常值检测
Step2: Robust optimization with GNC 采用GNC的稳健优化
Step3: Splitting strategy implementation 拆分策略实现
Output: Accurate point cloud alignment 准确的点云对齐
9.5 [9.5] 2503.07819 POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
[{'name': 'Joey Wilson, Marcelino Almeida, Sachit Mahajan, Martin Labrie, Maani Ghaffari, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnab Sen'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
active perception
uncertainty quantification
Input: Multi-view images 多视角图像
Step1: Derivation of covariance matrix 协方差矩阵的推导
Step2: Application of optimal experimental design 最优实验设计的应用
Step3: Quantification of information gain 信息增益的量化
Output: Enhanced strategies for 3D Gaussian Splatting 改进的三维高斯点云策略
9.5 [9.5] 2503.07828 Neural Radiance and Gaze Fields for Visual Attention Modeling in 3D Environments
[{'name': 'Andrei Chubarau, Yinan Wang, James J. Clark'}]
Neural Rendering 神经渲染 v2
Neural Radiance Fields
visual attention
3D environments
gaze prediction
Input: 2D images of a 3D scene 3D场景的2D图像
Step1: NeRF training NeRF训练
Step2: Gaze prediction network training 注视预测网络训练
Step3: Gaze visualization and mapping to 3D structure 注意力可视化和3D结构映射
Output: Visual attention patterns and rendered images 可视化注意模式和渲染图像
9.5 [9.5] 2503.07874 Topology-Preserving Loss for Accurate and Anatomically Consistent Cardiac Mesh Reconstruction
[{'name': 'Chenyu Zhang, Yihao Luo, Yinzhe Wu, Choon Hwai Yap, Guang Yang'}]
3D Reconstruction and Modeling 三维重建 v2
cardiac mesh reconstruction
topology-preserving loss
Input: Volumetric data 体积数据
Step1: Identify topology-violating points 确定违反拓扑结构的点
Step2: Apply Topology-Preserving Mesh Loss 应用拓扑保护网格损失
Step3: Perform mesh deformation and optimization 执行网格形变与优化
Output: Accurate and anatomically consistent cardiac meshes 准确且解剖上一致的心脏网格
9.5 [9.5] 2503.07940 BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes
[{'name': 'Minkyun Seo, Hyungtae Lim, Kanghee Lee, Luca Carlone, Jaesik Park'}]
Point Cloud Processing 点云处理 v2
Point Cloud Registration 点云注册
Generalization 泛化能力
Zero-Shot Learning 零样本学习
Input: Point cloud data 点云数据
Step1: Identify limitations of existing methods 识别现有方法的局限性
Step2: Develop zero-shot registration framework 开发零样本注册框架
Step3: Implement adaptive voxel size and search radii 实现自适应体素大小和搜索半径
Output: Robust point cloud registration pipeline 稳健的点云注册管道
9.5 [9.5] 2503.07952 NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields
[{'name': 'Yanyu Zhang, Dongming Wang, Jie Xu, Mengyuan Liu, Pengxiang Zhu, Wei Ren'}]
3D Reconstruction and Modeling 三维重建 v2
visual-inertial odometry
neural radiance fields
augmented reality
Input: Captured images and pre-trained NeRF model 采集的图像和预训练的NeRF模型
Step1: Initialize first IMU state 初始化第一个IMU状态
Step2: Define loss function based on geodesic distance 构建基于测地距离的损失函数
Step3: Integrate captured and rendered images 更新状态
Output: Updated poses and NeRF-based rendering 更新的位姿和基于NeRF的渲染
9.5 [9.5] 2503.08005 CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction
[{'name': 'Zhiyuan Wu, Xibin Song, Senbo Wang, Weizhe Liu, Jiayu Yang, Ziang Cheng, Shenzhou Chen, Taizhang Shang, Weixuan Sun, Shan Luo, Pan Ji'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
image-to-3D generation
multi-view consistency
2D diffusion models
Input: Single RGB image 单个RGB图像
Step1: Generate main views using a 2D diffusion model 使用2D扩散模型生成主要视图
Step2: Apply Dense View Interpolation (DVI) for additional view synthesis 使用密集视图插值(DVI)进行附加视图合成
Step3: Tri-plane-based mesh reconstruction to create 3D mesh 使用三平面网格重建创建3D网格
Output: High-quality 3D meshes with improved texture and geometry 输出: 具有改善纹理和几何形状的高质量3D网格
9.5 [9.5] 2503.08092 SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection
[{'name': 'Hyeongseok Son, Jia He, Seung-In Park, Ying Min, Yunhao Zhang, ByungIn Yoo'}]
3D Object Detection 三维物体检测 v2
3D Object Detection
Sparse Voxel Features
Autonomous Driving
Input: Multi-modal data (LiDAR and camera) 多模态数据(LiDAR和相机)
Step1: Feature Extraction 特征提取
Step2: Sparse Voxel Representation 稀疏体素表示
Step3: Transformer-based Detection 基于变压器的检测
Output: Detected 3D objects 检测到的三维物体
9.5 [9.5] 2503.08093 MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction
[{'name': 'Chenfeng Hou, Qi Xun Yeo, Mengqi Guo, Yongxin Su, Yanyan Li, Gim Hee Lee'}]
Surface Reconstruction 表面重建 v2
3D reconstruction
Gaussian splatting
surface reconstruction
multi-view consistency
Input: Multi-view images 多视角图像
Step1: Feature extraction 特征提取
Step2: Distractor masking distractor mask generation distractor 遮罩生成
Step3: Gaussian pruning 高斯剪枝
Step4: Surface reconstruction 表面重建
Output: Enhanced 3D models 改进的三维模型
9.5 [9.5] 2503.08135 ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting
[{'name': 'Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, Ruizhen Hu'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
articulated objects
3D reconstruction
self-supervised learning
digital twins
Input: Multi-view imagery of articulated objects 多视角图像
Step 1: Concurrent part-level reconstruction 部件级同时重建
Step 2: Multi-step optimization of parameters 多步骤参数优化
Step 3: Model formation using 3D Gaussian representations 使用3D高斯模型形成
Output: Digital twins of articulated objects in 3D digital format 输出: 3D数字格式的物体数字双胞胎
9.5 [9.5] 2503.08140 HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views
[{'name': 'Ethan Griffiths, Maryam Haghighat, Simon Denman, Clinton Fookes, Milad Ramezani'}]
3D Place Recognition 3D位置识别 v2
Lidar place recognition
3D reconstruction
autonomous systems
Input: Lidar point cloud data 激光雷达点云数据
Step1: Octree-based multi-scale attention mechanism 八叉树多尺度注意机制
Step2: Relay tokens for efficient communication 采用中继标记以提高通信效率
Step3: Pyramid attentional pooling for global descriptor synthesis 采用金字塔注意池化以合成全局描述符
Output: Robust global descriptors for place recognition 输出: 用于位置识别的鲁棒全局描述符
9.5 [9.5] 2503.08142 A Framework for Reducing the Complexity of Geometric Vision Problems and its Application to Two-View Triangulation with Approximation Bounds
[{'name': 'Felix Rydell, Georg B\"okman, Fredrik Kahl, Kathl\'en Kohn'}]
Structure from Motion (SfM) 运动结构估计 v2
3D reconstruction
triangulation
Structure-from-Motion
Input: Noisy 2D projections from multiple images 多个图像的噪声2D投影
Step1: Cost function reweighting 代价函数重加权
Step2: Simplification of polynomial degree to improve efficiency 简化多项式的程度以提高效率
Step3: Derive optimal weighting strategies 推导最佳加权策略
Output: Closed-form solution for triangulation 闭式解的三角测量
9.5 [9.5] 2503.08208 Explaining Human Preferences via Metrics for Structured 3D Reconstruction
[{'name': 'Jack Langerman, Denys Rozumnyi, Yuzhong Huang, Dmytro Mishkin'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
metrics
human preferences
Input: Structured 3D reconstructions 结构三维重建
Step1: Evaluate automated metrics 评估自动化度量
Step2: Analyze human preferences 分析人类偏好
Step3: Propose metrics and recommendations 提出度量和建议
Output: Improved metric for 3D reconstructions 改进的三维重建度量
9.5 [9.5] 2503.08217 S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction
[{'name': 'Guangting Zheng, Jiajun Deng, Xiaomeng Chu, Yu Yuan, Houqiang Li, Yanyong Zhang'}]
3D Reconstruction and Modeling 3D重建与建模 v2
3D reconstruction
street scene
Input: Multi-view images 多视角图像
Step1: Data integration 数据集成
Step2: Algorithm development 算法开发
Step3: Model evaluation 模型评估
Output: Streamlined reconstruction pipeline 精简的重建管线
9.5 [9.5] 2503.08218 MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior
[{'name': 'Kaiqiang Xiong, Ying Feng, Qi Zhang, Jianbo Jiao, Yang Zhao, Zhihao Liang, Huachen Gao, Ronggang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D human reconstruction
multi-view diffusion model
Input: Single image 单张图像
Step1: Generate multi-view images from a single reference image 从单个参考图像生成多视角图像
Step2: Introduce an alignment module for camera poses 引入相机位姿对齐模块
Step3: Optimize 3D Gaussians and refine facial regions 优化3D高斯,并细化面部区域
Output: High-fidelity free-view 3D human rendering 输出:高保真自由视图3D人类渲染
9.5 [9.5] 2503.08219 CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning
[{'name': 'Kaiqiang Xiong, Rui Peng, Zhe Zhang, Tianxing Feng, Jianbo Jiao, Feng Gao, Ronggang Wang'}]
Multi-view Stereo 多视角立体 v2
3D reconstruction
Multi-view Stereo
contrastive learning
autonomous driving
Input: Multi-view images 多视角图像
Step1: Integrate dual-level contrastive learning 双层对比学习集成
Step2: Implement image-level contrastive loss 实现图像级对比损失
Step3: Implement scene-level contrastive loss 实现场景级对比损失
Step4: L0.5 photometric consistency loss implementation L0.5光度一致性损失实现
Output: Enhanced depth estimation 改进的深度估计
9.5 [9.5] 2503.08224 HRAvatar: High-Quality and Relightable Gaussian Head Avatar
[{'name': 'Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
head avatars
Input: Monocular video input 单胞视频输入
Step1: Optimize facial tracking through end-to-end training 优化面部追踪,采用端到端训练
Step2: Utilize learnable blendshapes for deformation 使用可学习的混合形状进行变形
Step3: Model head appearance using physical properties and shading techniques 使用物理属性和阴影技术建模头部外观
Output: High-fidelity, relightable 3D head avatars 输出:高保真、可照明的三维头部头像
9.5 [9.5] 2503.08336 Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving
[{'name': 'Runwei Guan, Jianan Liu, Ningwei Ouyang, Daizong Liu, Xiaolou Sun, Lianqing Zheng, Ming Xu, Yutao Yue, Hui Xiong'}]
3D Reconstruction and Modeling 三维重建 v2
3D visual grounding
LiDAR
radar
autonomous driving
Input: Dual-sensor inputs (LiDAR and radar) 双传感器输入 (激光雷达和雷达)
Step 1: Feature extraction from LiDAR and radar sensor 数据提取: 从激光雷达和雷达传感器提取特征
Step 2: Dual-sensor feature fusion using Bidirectional Agent Cross Attention (BACA) 双传感器特征融合: 使用双向代理交叉注意力 (BACA)
Step 3: Region localization using Dynamic Gated Graph Fusion (DGGF) 区域定位: 使用动态门控图融合 (DGGF)
Output: 3D visual grounding prediction 3D视觉定位预测
9.5 [9.5] 2503.08352 Mitigating Ambiguities in 3D Classification with Gaussian Splatting
[{'name': 'Ruiqi Zhang, Hao Zhu, Jingyi Zhao, Qi Zhang, Xun Cao, Zhan Ma'}]
3D Classification 3D 分类 v2
3D classification
Gaussian Splatting
point clouds
ambiguity
Input: GS point cloud as input 输入: GS 点云
Step1: Analyze ambiguities in traditional point cloud 分析传统点云中的歧义
Step2: Implement Gaussian Splatting classification 实施高斯点云分类
Step3: Evaluate performance using a new dataset 通过新数据集评估性能
Output: Enhanced classification of 3D objects 输出: 改进的 3D 对象分类
9.5 [9.5] 2503.08363 Parametric Point Cloud Completion for Polygonal Surface Reconstruction
[{'name': 'Zhaiyu Chen, Yuqing Wang, Liangliang Nan, Xiao Xiang Zhu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
point cloud completion
polygonal surfaces
Input: Incomplete point cloud 数据集
Step1: Infer parametric primitives 推断参数化原始体
Step2: Recover high-level geometric structures 恢复高层次几何结构
Step3: Construct polygonal surfaces from primitives 根据原始体构建多边形表面
Output: High-quality polygonal surface reconstruction 高质量多边形表面重建
9.5 [9.5] 2503.08382 Twinner: Shining Light on Digital Twins in a Few Snaps
[{'name': 'Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotny'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
PBR
digital twins
autonomous systems
Input: Posed images 设定图像
Step1: Voxel-grid transformer transformation 体素网格转换
Step2: Photometric error minimization 照明误差最小化
Step3: Model evaluation and comparison 模型评估与比较
Output: 3D geometry and materials 三维几何与材料
9.5 [9.5] 2503.08407 WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
[{'name': 'Yansong Guo, Jie Hu, Yansong Qu, Liujuan Cao'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D segmentation
2D images
real-time systems
Input: 2D images of arbitrary 3D objects 多视角图像
Step1: Pre-processing: 2D mask feature construction 预处理:2D掩模特征构建
Step2: Dynamic Global Aligning (DGA) for accuracy improvement 动态全局对齐(DGA)来提升精度
Step3: Multi-view Group Mapping (MGM) for real-time segmentation 多视角组映射(MGM)实现实时分割
Output: Aligned 3D segmentation results 对齐的3D分割结果
9.5 [9.5] 2503.08422 JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data
[{'name': 'Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo'}]
3D Reconstruction and Modeling 三维重建 v2
3D object detection
LiDAR
simulation-to-real
autonomous driving
Input: LiDAR point clouds from real and simulated environments
Step1: Jittering augmentation to enhance sample efficiency
Step2: Utilize a domain-aware backbone for better feature extraction
Step3: Implement memory-based sectorized alignment loss to bridge the simulation-to-real gap
Output: Effective 3D object detection with minimal real labels
9.5 [9.5] 2503.08511 PCGS: Progressive Compression of 3D Gaussian Splatting
[{'name': 'Yihang Chen, Mengyao Li, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai'}]
3D Reconstruction and Modeling 三维重建 v2
3D Gaussian Splatting
progressive compression
novel view synthesis
Input: 3D Gaussian Splatting data 3D高斯点云数据
Step1: Progressive masking strategy progressive masking strategy
Step2: Progressive quantization approach progressive quantization method
Step3: Entropy coding enhancement entropy coding优化
Output: Compressed bitstream with fidelity 改进的压缩比特流
9.5 [9.5] 2503.08516 High-Quality 3D Head Reconstruction from Any Single Portrait Image
[{'name': 'Jianfu Zhang, yujie Gao, Jiahui Zhan, Wentao Wang, Yiyi Zhang, Haohua Zhao, Liqing Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
portrait images
facial expressions
Input: Single portrait image 单幅肖像图像
Step1: Data collection 数据收集
Step2: Multi-view video generation 多视角视频生成
Step3: Identity and expression integration 身份和表情整合
Step4: 3D head reconstruction 3D头部重建
Output: High-quality 3D head models 高质量3D头部模型
9.5 [9.5] 2503.08594 3D Point Cloud Generation via Autoregressive Up-sampling
[{'name': 'Ziqiao Meng, Qichao Wang, Zhipeng Zhou, Irwin King, Peilin Zhao'}]
3D Reconstruction and Modeling 三维重建 v2
3D point cloud generation
autoregressive modeling
up-sampling
Input: 3D point clouds 3D 点云
Step1: Learn multi-scale discrete representations 学习多尺度离散表示
Step2: Train autoregressive transformer 训练自回归变换器
Step3: Generate point clouds 生成点云
Output: Refined 3D point clouds 精炼的 3D 点云
9.5 [9.5] 2503.08601 LiSu: A Dataset and Method for LiDAR Surface Normal Estimation
[{'name': "Du\v{s}an Mali\'c, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger"}]
3D Reconstruction 三维重建 v2
LiDAR
surface normal estimation
autonomous driving
3D reconstruction
Input: LiDAR point clouds LiDAR点云
Step1: Generate synthetic dataset 生成合成数据集
Step2: Develop surface normal estimation method 开发表面法线估计方法
Step3: Evaluate model performance 评估模型性能
Output: Accurate surface normals for 3D reconstruction 改进的三维重建表面法线
9.5 [9.5] 2503.08639 GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection
[{'name': "Du\v{s}an Mali\'c, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger"}]
3D Object Detection 3D物体检测 v2
3D object detection
domain generalization
LiDAR
Gaussian blobs
local geometry
Input: LiDAR point cloud data 激光雷达点云数据
Step1: Encode local point cloud neighborhoods using Gaussian blobs 使用高斯点云对局部点云邻域进行编码
Step2: Integrate the Gaussian blobs into existing detection frameworks 将高斯点云集成到现有检测框架中
Step3: Evaluate model performance on cross-domain benchmarks 在跨域基准测试中评估模型性能
Output: Enhanced detection accuracy in domain generalization 在领域泛化中提高检测精度
9.5 [9.5] 2503.08664 MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
[{'name': 'Yuhan Wang, Fangzhou Hong, Shuai Yang, Liming Jiang, Wayne Wu, Chen Change Loy'}]
3D Generation 三维生成 v2
3D generation
multiview diffusion
human modeling
Input: Frontal image of a human figure 人物的正面图像
Step1: Establish correspondences using rasterization 和投影建立对应关系
Step2: Introduce mesh attention to handle high resolution 引入网格注意力以处理高分辨率
Step3: Generate multiview images using the trained model 使用训练好的模型生成多视角图像
Output: Dense, view-consistent human images at megapixel resolution 输出:百万像素分辨率下的稠密一致人像图像
9.5 [9.5] 2503.08676 Language-Depth Navigated Thermal and Visible Image Fusion
[{'name': 'Jinchang Zhang, Zijun Li, Guoyu Lu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
image fusion
depth estimation
autonomous driving
Input: Infrared and visible images, along with depth information 输入: 红外和可见图像,及深度信息
Step1: Multi-channel feature extraction using a diffusion model 步骤1: 使用扩散模型进行多通道特征提取
Step2: Language-guided fusion with depth information 步骤2: 结合深度信息的语言指导融合
Step3: Depth estimation and optimization of the fusion network 步骤3: 深度估计并优化融合网络
Output: Enhanced color-fused images 输出: 改进的彩色融合图像
9.2 [9.2] 2503.07946 7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting
[{'name': 'Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu'}]
Neural Rendering 神经渲染 v2
real-time rendering
Gaussian Splatting
dynamic scenes
Input: Scene elements represented as 7D Gaussians 场景元素以7D高斯表示
Step1: Conditional slicing mechanism 逐步:条件切片机制
Step2: Joint optimization integration 联合优化集成
Step3: Rendering of dynamic scenes 渲染动态场景
Output: Real-time rendering with view-dependent effects 输出:支持视图依赖的实时渲染
9.2 [9.2] 2503.08101 Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning
[{'name': 'Lizhen Xu, Xiuxiu Bai, Xiaojun Jia, Jianwu Fang, Shanmin Pang'}]
3D Object Detection 3D目标检测 v2
3D object detection 3D目标检测
zero-shot pruning零样本剪枝
transformer decoders 变换器解码器
Input: 3D object detection models 3D目标检测模型
Step1: Classification score extraction 分类评分提取
Step2: Importance score computation 重要性评分计算
Step3: Key pruning based on importance 依据重要性进行关键字剪枝
Output: Accelerated inference speed 加速的推理速度
9.0 [9.0] 2503.08373 nnInteractive: Redefining 3D Promptable Segmentation
[{'name': 'Fabian Isensee, Maximilian Rokuss, Lars Kr\"amer, Stefan Dinkelacker, Ashis Ravindran, Florian Stritzke, Benjamin Hamm, Tassilo Wald, Moritz Langenberg, Constantin Ulrich, Jonathan Deissler, Ralf Floca, Klaus Maier-Hein'}]
3D Segmentation 三维分割 v2
3D segmentation
interactive segmentation
volumetric data
Input: User prompts (points, scribbles, bounding boxes, lasso) 用户提示(点、涂鸦、边界框、套索)
Step1: Data integration from volumetric datasets 从体积数据集中进行数据集成
Step2: 3D interactive segmentation algorithm development 开发 3D 交互式分割算法
Step3: Integration into imaging platforms (e.g., Napari, MITK) 集成到成像平台(例如,Napari,MITK)
Output: Full 3D segmentations from 2D interactions 从 2D 交互生成完整的 3D 分割
9.0 [9.0] 2503.08471 TrackOcc: Camera-based 4D Panoptic Occupancy Tracking
[{'name': 'Zhuoguang Chen, Kenan Li, Xiuyu Yang, Tao Jiang, Yiming Li, Hang Zhao'}]
Autonomous Systems and Robotics 自动驾驶及机器人技术 v2
4D occupancy tracking
autonomous systems
3D tracking
camera-based perception
Input: Camera images 相机图像
Step1: Image feature extraction 图像特征提取
Step2: 4D panoptic queries integration 4D全景查询集成
Step3: Result prediction 结果预测
Output: Panoptic occupancy labels 全景占用标签
8.5 [8.5] 2503.07813 AgriField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel
[{'name': 'Elvis Kimara, Mozhgan Hadadi, Jackson Godbersen, Aditya Balu, Talukder Jubery, Yawei Li, Adarsh Krishnamurthy, Patrick S. Schnable, Baskar Ganapathysubramanian'}]
3D Reconstruction and Modeling 三维重建 v2
3D point clouds
agricultural research
maize
Input: 3D point clouds of maize plants 玉米植物的三维点云
Step1: Data collection 数据收集
Step2: Procedural model generation 程序模型生成
Step3: Graph-based segmentation 图基于的分割
Output: Curated dataset for agricultural research 为农业研究提供的整理数据集
8.5 [8.5] 2503.07829 Fixing the RANSAC Stopping Criterion
[{'name': 'Johannes Sch\"onberger, Viktor Larsson, Marc Pollefeys'}]
Multi-view and Stereo Vision 多视角立体视觉 v2
RANSAC
3D reconstruction
robust estimation
Input: Noisy measurements 噪声测量
Step1: Analyze RANSAC sampling probability 分析RANSAC采样概率
Step2: Derive exact stopping criterion 推导精确停止准则
Step3: Evaluate model performance 评估模型性能
Output: Improved model estimation 改进的模型估计
8.5 [8.5] 2503.07909 FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
[{'name': 'Dennis Rotondi, Fabio Scaparro, Hermann Blum, Kai O. Arras'}]
3D Scene Graphs 3D场景图 v2
3D scene graphs
functional interactive elements
robot perception
affordance grounding
Input: Multi-view RGB-D images 多视角RGB-D图像
Step1: Detect functional elements 检测功能性元件
Step2: Augment 3D scene graph generation 扩展3D场景图生成
Step3: Evaluate functional segmentation 评估功能分割
Output: Enhanced 3D scene graphs 改进的3D场景图
8.5 [8.5] 2503.07933 From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans
[{'name': 'Qinji Yu, Yirui Wang, Ke Yan, Dandan Zheng, Dashan Ai, Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Yun Bian, Na Shen, Xiaowei Ding, Le Lu, Xianghua Ye, Dakai Jin'}]
3D Reconstruction 三维重建 v2
3D reconstruction
autonomous driving
Input: 3D CT scans
Step1: Transform slice-based detection to a tracking task
Step2: Develop a transformer decoder for tracking and detection
Step3: Evaluate 3D instance association
Output: Enhanced lymph node detection in 3D CT scans
8.5 [8.5] 2503.07939 STRMs: Spatial Temporal Reasoning Models for Vision-Based Localization Rivaling GPS Precision
[{'name': 'Hin Wai Lui, Jeffrey L. Krichmar'}]
Localization and Navigation 本地化与导航 v2
vision-based localization
3D reconstruction
autonomous navigation
Input: First-person perspective observations (FPP) 第一定义观察
Step 1: Data transformation to global map perspective (GMP) 数据转化为全景视图
Step 2: Model training using VAE-RNN and VAE-Transformer 使用VAE-RNN和VAE-Transformer进行模型训练
Step 3: Performance evaluation in real-world environments 在真实环境中评估性能
Output: Precise geographical coordinates and localization capabilities 精确的地理坐标和定位能力
8.5 [8.5] 2503.07942 STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications
[{'name': 'Andrew Gao, Jun Liu'}]
Autonomous Systems and Robotics 自动驾驶 v2
anomaly detection
autonomous driving
Input: Video data 视频数据
Step1: Anomaly detection algorithm development 异常检测算法开发
Step2: Feature extraction 特征提取
Step3: Model evaluation 模型评估
Output: Anomalies identified anomalies identified
8.5 [8.5] 2503.08016 SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving
[{'name': 'Akshat Ghiya, Ali K. AlShami, Jugal Kalita'}]
Autonomous Driving 自动驾驶 v2
pedestrian trajectory prediction
autonomous driving
pose estimation
skeleton information
Input: Video data 视频数据
Step1: Extract skeleton information using ViTPose 从ViTPose提取骨骼信息
Step2: Compute joint angles based on skeleton data 根据骨骼数据计算关节角度
Step3: Integrate pose information with bounding box data 将姿态信息与边界框数据集成
Step4: Apply temporal data augmentation for improved performance 进行时间数据增强以提高性能
Output: Predicted pedestrian trajectories 预测行人轨迹
8.5 [8.5] 2503.08068 Simulating Automotive Radar with Lidar and Camera Inputs
[{'name': 'Peili Song, Dezhen Song, Yifan Yang, Enfan Lan, Jingtai Liu'}]
Autonomous Systems and Robotics 自动驾驶 v2
Automotive radar
Autonomous driving
Data simulation
Neural networks
Lidar and camera integration
Input: Camera images and lidar point clouds 摄像头图像和激光雷达点云
Step1: Estimate radar signal distribution 估计雷达信号分布
Step2: Generate 4D radar signals 生成4D雷达信号
Step3: Predict radar signal strength (RSS) 预测雷达信号强度 (RSS)
Output: Simulated radar datagram 输出: 模拟雷达数据包
8.5 [8.5] 2503.08165 Multimodal Generation of Animatable 3D Human Models with AvatarForge
[{'name': 'Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang'}]
3D Generation 三维生成 v2
3D human modeling 3D人类建模
animatable avatars 可动画头像
LLM integration LLM集成
Input: Text or image inputs 文本或图像输入
Step1: Capture detailed specifications 捕捉详细规范
Step2: Integrate LLM for commonsense reasoning 集成LLM进行常识推理
Step3: Utilize 3D human generators 利用3D人类生成器
Step4: Iterative refinement through auto-verification 通过自动验证进行迭代完善
Output: Customizable, animatable 3D human avatars 输出: 可定制的可动画3D人类头像
8.5 [8.5] 2503.08377 Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens
[{'name': 'Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang'}]
Image Generation 图像生成 v2
image reconstruction
latent diffusion models
tokenization
Input: High-resolution images 高分辨率图像
Step1: Image tokenization 图像令牌化
Step2: Latent consistency decoding 潜在一致性解码
Step3: Token compression 令牌压缩
Output: Efficient 1024x1024 image representation 高效的1024x1024图像表示
8.5 [8.5] 2503.08421 Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels
[{'name': 'Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, Chenglu Wen'}]
3D Object Detection 3D物体检测 v2
3D object detection
LiDAR scans
unsupervised learning
Input: Multi-agent LiDAR scans 多代理LiDAR扫描
Step1: Initialization with shared ego-pose and ego-shape 使用共享的自我姿态和自我形状初始化
Step2: Preliminary label generation 生成初步标签
Step3: Multi-scale encoding for label refinement 对标签进行多尺度编码以进行精炼
Step4: Contrastive learning with refined labels 使用精炼标签进行对比学习
Output: High-quality detection results 高质量检测结果
8.5 [8.5] 2503.08483 GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields
[{'name': 'Nhat Phuong Anh Vu, Abhishek Saroha, Or Litany, Daniel Cremers'}]
Neural Rendering 神经渲染 v2
3D stylization
dynamic radiance fields
Input: Dynamic scenes 动态场景
Step1: Extract depth maps 提取深度图
Step2: Geometry and appearance stylization 几何和外观风格化
Step3: Temporal coherence maintenance 时间一致性维护
Output: Stylized dynamic radiance fields 风格化动态辐射场
8.5 [8.5] 2503.08485 TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting
[{'name': 'Fengyi Zhang, Huitong Yang, Zheng Zhang, Zi Huang, Yadan Luo'}]
3D Reconstruction and Modeling 三维重建 v2
occupancy prediction
3D Gaussians
autonomous driving
Input: Raw sensor streams 原始传感器流
Step1: Lift surrounding-view semantics to instantiate Gaussians 提升周围视图语义以实例化高斯
Step2: Move dynamic Gaussians along estimated scene flow 移动动态高斯以沿估计场景流进行
Step3: Smooth neighboring Gaussians during optimization 平滑优化过程中相邻高斯
Output: Voxelized occupancy prediction 体素化占用预测
8.5 [8.5] 2503.08512 SAS: Segment Any 3D Scene with Integrated 2D Priors
[{'name': 'Zhuoyuan Li, Jiahao Lu, Jiacheng Deng, Hanzhi Chang, Lifan Wu, Yanzhe Liang, Tianzhu Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene understanding
open vocabulary
point cloud
2D to 3D correspondence
Input: Point cloud features and 2D model capabilities 2D模型能力和点云特征
Step1: Model Alignment via Text 模型对齐
Step2: Annotation-Free Model Capability Construction 免标注模型能力构建
Step3: Feature distillation to 3D domain 特征蒸馏到3D域
Output: Integrated 3D scene representations 集成的3D场景表示
8.5 [8.5] 2503.08596 X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction
[{'name': 'Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
X-ray imaging
medical diagnostics
Input: X-ray projections X射线投影
Step1: Material modeling 材料建模
Step2: Path partitioning algorithm 路径分区算法
Step3: Energy absorption estimation 能量吸收估算
Output: 3D representations of internal structures 内部结构的三维表示
8.5 [8.5] 2503.08673 Keypoint Detection and Description for Raw Bayer Images
[{'name': 'Jiakai Lin, Jinchang Zhang, Guoyu Lu'}]
Robotic Perception 机器人感知 v2
keypoint detection
SLAM
raw images
Input: Raw Bayer images 原始拜尔图像
Step1: Develop convolutional kernels 开发卷积核
Step2: Direct keypoint detection 直接关键点检测
Step3: Feature description 特征描述
Output: Accurate keypoints and descriptors 准确的关键点和描述符
8.5 [8.5] 2503.08683 CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving
[{'name': 'Changxing Liu, Genjia Liu, Zijun Wang, Jinchang Yang, Siheng Chen'}]
Autonomous Systems and Robotics 自主系统与机器人 v2
cooperative autonomous driving
vehicle-to-vehicle communication
LLM-based negotiation
real-time control
Input: Vehicle-to-vehicle data 车辆间数据
Step1: LLM-based negotiation module 建立互动的LLM协商模块
Step2: Intention-guided waypoint generation 道路意图引导的路径生成
Step3: Real-time driving control 实时驾驶控制
Output: Improved cooperative driving performance 改进的合作驾驶性能
7.5 [7.5] 2503.08368 Debiased Prompt Tuning in Vision-Language Model without Annotations
[{'name': 'Chaoquan Jiang, Yunfan Yang, Rui Hu, Jitao Sang'}]
Vision-Language Models (VLMs) 视觉语言模型 v2
Vision-Language Models
Robustness
Debiased Prompt Tuning
Input: Vision-Language Models (VLMs) 视觉语言模型
Step1: Analyze spurious correlations 分析虚假相关性
Step2: Utilize zero-shot recognition capabilities 利用零样本识别能力
Step3: Propose a debiased prompt tuning method 提出去偏置的提示调整方法
Output: Improved group robustness 提高的群体稳健性

Arxiv 2025-03-11

Relavance Title Research Topic Keywords Pipeline
9.5 [9.5] 2503.06117 NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features
[{'name': 'Hongjia Zhai, Boming Zhao, Hai Li, Xiaokun Pan, Yijia He, Zhaopeng Cui, Hujun Bao, Guofeng Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
visual localization
neural implicit maps
3D modeling
Input: 2D images with 3D context 提供2D图像与3D上下文
Step1: Extract 2D feature maps 提取2D特征图
Step2: Learn a 3D keypoint descriptor field 学习3D关键点描述符场
Step3: Align feature distributions 对齐特征分布
Step4: Establish matching graph 建立匹配图
Output: 6-DoF pose estimation 输出6自由度位姿估计
9.5 [9.5] 2503.06154 SRM-Hair: Single Image Head Mesh Reconstruction via 3D Morphable Hair
[{'name': 'Zidu Wang, Jiankuo Zhao, Miao Xu, Xiangyu Zhu, Zhen Lei'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D reconstruction
3DMM
hair modeling
Input: Single image 单张图像
Step1: Data collection 数据收集
Step2: Semantic-consistent ray modeling 语义一致的光线建模
Step3: Hair mesh reconstruction 头发网格重建
Output: 3D hair mesh 3D头发网格
9.5 [9.5] 2503.06219 VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion
[{'name': 'Meng Wang, Huilong Pi, Ruihui Li, Yunchuan Qin, Zhuo Tang, Kenli Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D semantic scene completion
autonomous driving
vision-language models
Input: Camera-based images 相机采集图像
Step1: Vision-language guidance distillation 视觉语言指导蒸馏
Step2: Geometric-semantic awareness mechanism 几何-语义感知机制
Step3: Model evaluation 模型评估
Output: Enhanced 3D semantic representations 改进的三维语义表示
9.5 [9.5] 2503.06222 Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
[{'name': 'Meng Wang, Fan Wu, Yunchuan Qin, Ruihui Li, Zhuo Tang, Kenli Li'}]
3D Reconstruction and Modeling 三维重建 v2
3D scene completion
autonomous driving
semantic scene completion
Input: 2D images
Step1: Extract 2D explicit semantics and align into 3D space
Step2: Decouple scene information into dynamic and static features
Step3: Design dynamic-static adaptive fusion module
Output: Robust and accurate semantic scene representations
9.5 [9.5] 2503.06235 StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams
[{'name': 'Yang LI, Jinglu Wang, Lei Chu, Xiao Li, Shiu-hong Kao, Ying-Cong Chen, Yan Lu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
Input: Unposed image streams 未标定图像流
Step1: Predict per-frame Gaussians 逐帧预测高斯
Step2: Establish pixel correspondences 建立像素对应关系
Step3: Merge redundant Gaussians 合并冗余高斯
Output: Online 3D reconstruction 在线三维重建
9.5 [9.5] 2503.06237 Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection
[{'name': 'Yifan Chang, Junjie Huang, Xiaofeng Wang, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du, Xingang Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D lane detection
autonomous driving
geometric structures
Input: Monocular images 单目图像
Step1: Theoretical analysis 理论分析
Step2: Patching strategy development 修补策略开发
Step3: Model enhancement 模型增强
Output: Improved lane representations 改进的车道表示
9.5 [9.5] 2503.06462 StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting
[{'name': 'Zexu Huang, Min Xu, Stuart Perry'}]
3D Reconstruction and Modeling 三维重建与建模 v2
3D Gaussian Splatting
3D reconstruction
neural rendering
Input: Data from multiple views of a scene 多视角场景数据
Step1: Utilization of 3D Gaussian Splatting 采用3D高斯涂抹
Step2: Dynamic adjustment of spherical harmonics 动态调整球谐
Step3: Incorporation of Multi-scale Residual Network (MSRN) 引入多尺度残差网络
Step4: Rendering of high-quality images from low-resolution inputs 从低分辨率输入生成高质量图像
Output: Enhanced novel views of 3D models 改进的3D模型新视图
9.5 [9.5] 2503.06485 A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation
[{'name': 'Jiajie Fan, Amal Trigui, Andrea Bonfanti, Felix Dietrich, Thomas B\"ack, Hao Wang'}]
3D Generation 三维生成 v2
3D generation
spectral-domain diffusion
mesh processing
Input: High-dimensional shapes 高维形状
Step1: Shape encoding using SVD 采用SVD进行形状编码
Step2: Generative modeling on eigenfeatures 在特征向量上进行生成建模
Step3: Mesh generation based on spectral features 基于谱特征生成网格
Output: High-quality 3D shapes 生成高质量的三维形状
9.5 [9.5] 2503.06565 Future-Aware Interaction Network For Motion Forecasting
[{'name': 'Shijie Li, Xun Xu, Si Yong Yeo, Xulei Yang'}]
Autonomous Driving 自动驾驶 v2
motion forecasting
autonomous driving
spatiotemporal modeling
Input: Scene encoding with historical trajectories 输入: 包含历史轨迹的场景编码
Step 1: Integrate future trajectories into encoding 步骤1: 将未来轨迹整合到编码中
Step 2: Use Mamba for spatiotemporal modeling 步骤2: 使用Mamba进行时空建模
Step 3: Refine and predict future trajectories 步骤3: 精炼并预测未来轨迹
Output: Accurate future trajectory predictions 输出: 准确的未来轨迹预测
9.5 [9.5] 2503.06569 Global-Aware Monocular Semantic Scene Completion with State Space Models
[{'name': 'Shijie Li, Zhongyao Cheng, Rong Li, Shuai Li, Juergen Gall, Xun Xu, Xulei Yang'}]
3D Reconstruction and Modeling 三维重建与建模 v2
Semantic Scene Completion 语义场景补全
3D Reconstruction 三维重建
Monocular Vision 单目视觉
Input: Single image 单幅图像
Step1: 2D feature extraction 2D特征提取
Step2: Long-range dependency modeling 长程依赖建模
Step3: 3D information completion 3D信息补全
Output: Complete 3D representation 完整的3D表现
9.5 [9.5] 2503.06587 Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction
[{'name': 'Xiaoming Peng, Yixin Yang, Yang Zhou, Hui Huang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
Gaussian Splatting
surface reconstruction
Input: 2D Gaussian Splatting data 2D高斯涂抹数据
Step1: Analyze reflection discontinuity 分析反射不连续性
Step2: Introduce depth convergence loss 引入深度收敛损失
Step3: Rectify depth criterion 修正深度标准
Output: Enhanced surface reconstruction 改进的表面重建
9.5 [9.5] 2503.06660 AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation
[{'name': 'Yang Zou, Zhaoshuai Qi, Yating Liu, Zihao Xu, Weipeng Sun, Weiyi Liu, Xingyuan Li, Jiaqi Yang, Yanning Zhang'}]
3D Reconstruction and Modeling 三维重建 v2
6D pose estimation 6D姿态估计
robotics 机器人
autonomous driving 自动驾驶
computer vision 计算机视觉
Input: Single view image 单视图图像
Step1: Axis Generation Module (AGM) construction 轴生成模块(AGM)构建
Step2: Geometric consistency loss injection 几何一致性损失注入
Step3: Triaxial Back-projection Module (TBM) application 三轴反投影模块(TBM)应用
Output: Estimated 6D object pose 估计的6D物体姿态
9.5 [9.5] 2503.06677 REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
[{'name': 'Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, Cewu Lu'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
articulated objects
Gaussian Splatting
Input: Multi-view RGB images of articulated objects 多视角RGB图像
Step1: Introduce Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields 引入签名距离场(SDF)引导以规范化高斯不透明度场
Step2: Establish deformable fields for 3D Gaussians constrained by kinematic structures 建立受运动结构约束的3D高斯可变形场
Step3: Achieve unsupervised generation of surface meshes in unseen states 实现对未见状态表面网格的无监督生成
Output: High-quality textured surface reconstruction and generation 输出:高质量纹理表面重建与生成
9.5 [9.5] 2503.06744 CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
[{'name': 'Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll'}]
Multi-view and Stereo Vision 多视角与立体视觉 v2
4D Gaussian Splatting
dynamic scene rendering
autonomous driving
Input: Dynamic scenes 动态场景
Step1: Use 2D segmentation for Gaussian features 使用2D分割获取高斯特征
Step2: Track temporally deformed features 跟踪时间变形特征
Step3: Aggregate context and deformation features 组合上下文和变形特征
Output: Enhanced dynamic scene representations 改进的动态场景表示
9.5 [9.5] 2503.06762 Gaussian RBFNet: Gaussian Radial Basis Functions for Fast and Accurate Representation and Reconstruction of Neural Fields
[{'name': 'Abdelaziz Bouzidi, Hamid Laga, Hazem Wannous'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
neural fields
Gaussian RBF
Input: Neural fields and images 神经场和图像
Step1: Replace MLP neurons with RBF kernels 用RBF核替换MLP神经元
Step2: Train for 3D geometry representation 训练3D几何表示
Step3: Optimize for novel view synthesis 优化新视图合成
Output: Fast and accurate neural representation 快速准确的神经表示
9.5 [9.5] 2503.06818 Sub-Image Recapture for Multi-View 3D Reconstruction
[{'name': 'Yanwei Wang'}]
3D Reconstruction and Modeling 三维重建 v2
3D reconstruction
multi-view geometry
Input: Original high-resolution images 原始高分辨率图像
Step1: Split images into sub-images 将图像分割成子图像
Step2: Process sub-images individually 分别处理子图像
Step3: Apply existing 3D reconstruction algorithms using sub-images 使用子图像处理现有三维重建算法
Output: Enhanced 3D reconstruction results 改进的三维重建结果
9.5 [9.5] 2503.06821 HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors
[{'name': 'Siyu Li, Yihong Cao, Hao Shi, Yongsheng Zang, Xuan He, Kailun Yang, Zhiyong Li'}]
Autonomous Driving 自动驾驶 v2
Bird's-Eye View (BEV) mapping
domain adaptation
3D mapping
Input: Multi-view images 多视角图像
Step1: Hierarchical perspective prior-guided domain adaptation 分层视角先验引导的领域适应
Step2: Component integration 组件集成 (SGPS, DACL, CDFM)
Step3: Performance evaluation 性能评估
Output: Enhanced BEV mapping results 改进的鸟瞰映射结果
9.5 [9.5] 2503.06900 DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation <