github_bot_3d_papers
github_bot_3d_papers copied to clipboard
Call Arxiv API and automatically update paper list
Daily Updates on 3D-Related Papers
This repository automatically fetches new or updated arXiv papers in the [cs.CV] category every day, checks if they are relevant to "3D reconstruction" or "3D generation" via ChatGPT, and lists them below.
How It Works
- A GitHub Actions workflow runs daily at 09:00 UTC.
- It uses the script fetch_cv_3d_papers.py to:
- Retrieve the latest arXiv papers in cs.CV.
- Use ChatGPT to filter out those related to 3D reconstruction/generation.
- Update this README.md with the new findings.
- Send an email via 163 Mail if any relevant papers are found.
Paper List
Arxiv 2025-04-15
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.08901 HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields [{'name': 'Asterios Reppas, Grigorios-Aris Cheimariotis, Panos K. Papadopoulos, Panagiotis Frasiolas, Dimitrios Zarpalas'}] |
Neural Rendering 神经渲染 | v2 Camera relocalization 相机重定位 Neural Radiance Fields 神经辐射场 Autonomous driving 自动驾驶 |
Input: Camera captures 相机捕捉 Step1: Initial pose estimation using CNN 装载初步姿态估计通过CNN Step2: Data augmentation with NeRFs 使用NeRF数据增强 Step3: Refinement using Monte Carlo particle filter 使用蒙特卡洛粒子过滤器进行优化 Output: High accuracy camera localization 高精度相机定位 |
| 9.5 | [9.5] 2504.09048 BlockGaussian: Efficient Large-Scale Scene NovelView Synthesis via Adaptive Block-Based Gaussian Splatting [{'name': 'Yongchang Wu, Zipeng Qi, Zhenwei Shi, Zhengxia Zou'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting novel view synthesis |
Input: Multi-view images 多视角图像 Step1: Content-aware scene partitioning 内容感知场景分割 Step2: Individual block optimization 独立块优化 Step3: Block merging and fusion 块合并与融合 Output: High-quality novel view synthesis 高质量的新视图合成 |
| 9.5 | [9.5] 2504.09062 You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting [{'name': 'Zhijie Shen, Chunyu Lin, Shujuan Huang, Lang Nie, Kang Liao, Yao Zhao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian splatting panoramic images |
Input: Panoramic images 全景图像 Step1: Introduce Transition Plane 引入过渡平面 Step2: Optimize 3D Gaussians in cubemap faces 在立方体面中优化3D高斯 Step3: Stitch cube faces into equirectangular panorama 将立方体面拼接成正方形全景 Output: Enhanced 3D models via Gaussian splatting 通过高斯点云改进的三维模型 |
| 9.5 | [9.5] 2504.09129 A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds [{'name': 'Jizong Peng, Tze Ho Elden Tse, Kai Xu, Wenchao Gao, Angela Yao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting camera pose estimation |
Input: Coarsely-posed images and noisy Lidar point clouds 粗略姿态图像和噪声激光雷达点云 Step1: Decompose camera pose into optimizations 分解相机姿态至优化步骤 Step2: Apply constrained optimization with geometric constraints 应用带有几何约束的约束优化 Step3: Perform simultaneous camera pose estimation and 3D reconstruction 进行同时的相机姿态估计和3D重建 Output: High-quality 3D reconstructions 高质量3D重建 |
| 9.5 | [9.5] 2504.09149 MASH: Masked Anchored SpHerical Distances for 3D Shape Representation and Generation [{'name': 'Changhao Li, Yu Xin, Xiaowei Zhou, Ariel Shamir, Hao Zhang, Ligang Liu, Ruizhen Hu'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D shape representation surface reconstruction generative model |
Input: Point clouds 点云 Step1: MASH parameterization MASH 参数化 Step2: Differentiable optimization 可微优化 Step3: Surface approximation 表面近似 Output: MASH representation MASH 表示 |
| 9.5 | [9.5] 2504.09328 Text To 3D Object Generation For Scalable Room Assembly [{'name': 'Sonia Laguna, Alberto Garcia-Garcia, Marie-Julie Rakotosaona, Stylianos Moschoglou, Leonhard Helminger, Sergio Orts-Escolano'}] |
3D Generation 三维生成 | v2 3D generation synthetic data Neural Radiance Fields |
Input: Text prompts 文本提示 Step1: Prompt engineering 提示工程 Step2: Synthetic data generation 合成数据生成 Step3: Integration into room layouts 房间布局集成 Output: Customizable 3D indoor scenes 自定义的三维室内场景 |
| 9.5 | [9.5] 2504.09491 DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering [{'name': 'Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo'}] |
3D Reconstruction 三维重建 | v2 3D Gaussian Splatting novel view synthesis overfitting dropout technique |
Input: Sparse-view images 短视图图像 Step1: Analyze performance degradation 分析性能退化 Step2: Implement dropout technique 实现dropout技术 Step3: Integrate edge-guided strategy 集成边缘引导策略 Output: Improved novel view synthesis outputs 改进的新视图合成输出 |
| 9.5 | [9.5] 2504.09518 3D CoCa: Contrastive Learners are 3D Captioners [{'name': 'Ting Huang, Zeyu Zhang, Yemin Wang, Hao Tang'}] |
3D Captioning and Vision-Language Learning 3D 描述和视觉-语言学习 | v2 3D captioning contrastive learning vision-language models |
Input: 3D scenes with point clouds 3D场景与点云 Step1: Contrastive pretraining using visual and textual data 对图像和文本数据进行对比预训练 Step2: Multimodal decoding for caption generation 多模态解码以生成描述性标题 Step3: Joint optimization of spatial reasoning and captioning tasks 对空间推理和描述任务进行联合优化 Output: Enhanced descriptive captions for 3D scenes 改进的3D场景描述性标题 |
| 9.5 | [9.5] 2504.09535 FastRSR: Efficient and Accurate Road Surface Reconstruction from Bird's Eye View [{'name': 'Yuting Zhao, Yuheng Ji, Xiaoshuai Hao, Shuxiao Li'}] |
3D Reconstruction 三维重建 | v2 Road Surface Reconstruction Autonomous Driving Depth-Aware Projection |
Input: Bird's Eye View images 鸟瞰图像 Step1: Depth-aware 3D-to-2D Projection (DAP) module 深度感知3D到2D投影模块 Step2: Spatial Attention Enhancement (SAE) module 空间注意力增强模块 Step3: Confidence Attention Generation (CAG) module 信心注意力生成模块 Output: Accurate road surface reconstruction 精确的道路表面重建 |
| 9.5 | [9.5] 2504.09588 TextSplat: Text-Guided Semantic Fusion for Generalizable Gaussian Splatting [{'name': 'Zhicong Wu, Hongbin Xu, Gang Xu, Ping Nie, Zhixin Yan, Jinkai Zheng, Liangqiong Qu, Ming Li, Liqiang Nie'}] |
3D Reconstruction 三维重建 | v2 3D Reconstruction Gaussian Splatting Semantic Fusion |
Input: Sparse multi-view images 稀疏多视角图像 Step1: Utilize Diffusion Prior Depth Estimator for depth information 使用扩散优先深度估计器获取深度信息 Step2: Employ Semantic Aware Segmentation Network for semantic information 使用语义感知分割网络获取语义信息 Step3: Refine cross-view features with Multi-View Interaction Network 使用多视角交互网络改善视图间特征 Step4: Integrate representations through Text-Guided Semantic Fusion Module 通过文本引导语义融合模块整合表示 Output: High-fidelity 3D reconstructions 高保真3D重建 |
| 9.5 | [9.5] 2504.09878 MCBlock: Boosting Neural Radiance Field Training Speed by MCTS-based Dynamic-Resolution Ray Sampling [{'name': 'Yunpeng Tan, Junlin Hao, Jiangkai Wu, Liming Liu, Qingyang Li, Xinggong Zhang'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Field 3D reconstruction ray-sampling |
Input: Training images 训练图像 Step1: Block partitioning 块划分 Step2: Initialization of block-tree 初始化块树 Step3: Dynamic optimization 动态优化 Output: Accelerated ray-sampling for NeRF 加速的NeRF光线采样 |
| 9.5 | [9.5] 2504.10001 GaussVideoDreamer: 3D Scene Generation with Video Diffusion and Inconsistency-Aware Gaussian Splatting [{'name': 'Junlin Hao, Peiheng Wang, Haoyang Wang, Xinggong Zhang, Zongming Guo'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction video diffusion Gaussian Splatting |
Input: Video sequences 视频序列 Step1: Geometry-aware initialization 几何感知初始化 Step2: Inconsistency-Aware Gaussian Splatting 处理不一致性高斯点云 Step3: Progressive video inpainting 渐进式视频修补 Output: Enhanced 3D scenes 改进的三维场景 |
| 9.5 | [9.5] 2504.10012 EBAD-Gaussian: Event-driven Bundle Adjusted Deblur Gaussian Splatting [{'name': 'Yufei Deng, Yuanjian Wang, Rong Xiao, Chenwei Tang, Jizhe Zhou, Jiahao Fan, Deng Xiong, Jiancheng Lv, Huajin Tang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction event camera motion blur removal multi-modal fusion |
Input: Event streams and blurred images 事件流和模糊图像 Step1: Construct a blur loss function 建立模糊损失函数 Step2: Optimize Gaussian parameters and camera trajectories 优化高斯参数和相机轨迹 Step3: Evaluate reconstruction quality 评估重建质量 Output: High-fidelity 3D reconstruction 高保真3D重建 |
| 9.5 | [9.5] 2504.10035 TT3D: Table Tennis 3D Reconstruction [{'name': 'Thomas Gossard, Andreas Ziegler, Andreas Zell'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction table tennis motion analysis sports analytics |
Input: Online table tennis match recordings 在线乒乓球比赛录音 Step1: Camera calibration using segmentation mask using known geometry known using segmentation mask 通过已知几何对象的分割掩模进行相机校准 Step2: Detect 2D ball positions using a deep learning-based ball detector 使用基于深度学习的球检测器检测2D球位置 Step3: Reconstruct 3D ball trajectories from camera calibrated images from camera using physics-based model using camera校准图像通过基于物理模型重建3D球轨迹 Output: Full 3D reconstruction of table tennis rallies 乒乓球比赛的完整3D重建 |
| 9.5 | [9.5] 2504.10117 AGO: Adaptive Grounding for Open World 3D Occupancy Prediction [{'name': 'Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, Andreas Zell'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D occupancy prediction 3D占用预测 autonomous driving 自动驾驶 vision-language models 视觉语言模型 |
Input: Sensor inputs (images) 传感器输入(图像) Step1: Encode images into 3D and text embeddings 将图像编码为3D和文本嵌入 Step2: Similarity-based grounding training with 3D pseudo-labels 基于相似性的3D伪标签训练 Step3: Map 3D embeddings to align with VLM-derived image embeddings 将3D嵌入映射以与VLM生成的图像嵌入对齐 Output: Improved voxelized 3D occupancy predictions 改进的体素化3D占用预测 |
| 9.5 | [9.5] 2504.10331 LL-Gaussian: Low-Light Scene Reconstruction and Enhancement via Gaussian Splatting for Novel View Synthesis [{'name': 'Hao Sun, Fenggen Yu, Huiyao Xu, Tao Zhang, Changqing Zou'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction 三维重建 novel view synthesis 新视图合成 |
Input: Low-light sRGB images 低光照sRGB图像 Step1: Low-Light Gaussian Initialization Module (LLGIM) 低光照高斯初始化模块 Step2: Dual-branch Gaussian decomposition model 双分支高斯分解模型 Step3: Unsupervised optimization strategy 无监督优化策略 Output: High-quality 3D point clouds 高质量三维点云 |
| 9.5 | [9.5] 2504.10466 Art3D: Training-Free 3D Generation from Flat-Colored Illustration [{'name': 'Xiaoyan Cong, Jiayi Shen, Zekun Li, Rao Fu, Tao Lu, Srinath Sridhar'}] |
3D Generation 三维生成 | v2 3D generation flat-colored images image-to-3D models |
Input: Flat-colored 2D illustrations 平面着色的2D插图 Step1: Generate multiple 3D proxy candidates 生成多个3D代理图像 Step2: Select the best candidate for 3D generation 选择最佳候选进行3D生成 Step3: Texture the generated mesh based on the original input 根据原始输入为生成的网格添加纹理 Output: Realistic 3D models 生成的真实感3D模型 |
| 9.2 | [9.2] 2504.10106 SoccerNet-v3D: Leveraging Sports Broadcast Replays for 3D Scene Understanding [{'name': "Marc Guti\'errez-P\'erez, Antonio Agudo"}] |
3D Reconstruction 三维重建 | v2 3D reconstruction multi-view synchronization camera calibration soccer analysis |
Input: Multi-view synchronized images from soccer broadcasts 多视角同步图像 Step1: Camera calibration using field-line annotations 使用场地线标注进行相机校准 Step2: Triangulation of 2D annotations to generate 3D positions 通过三角测量生成3D位置 Step3: Optimization of bounding boxes based on multi-view data 基于多视角数据优化边界框 Output: 3D ball localization annotations 3D球定位标注 |
| 9.0 | [9.0] 2504.09086 RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection [{'name': 'Yunfei Long, Abhinav Kumar, Xiaoming Liu, Daniel Morris'}] |
3D Object Detection 目标检测 | v2 3D object detection 3D目标检测 camera-radar fusion 相机-雷达融合 autonomous vehicles 自动驾驶车辆 |
Input: Monocular detections 单目检测 Step1: Predict radar hit distributions 预测雷达命中分布 Step2: Match radar points with predicted distribution 匹配雷达点与预测分布 Step3: Refine detection scores using fusion refinement 通过融合优化检测得分 Output: Enhanced 3D object detection improved 3D目标检测 |
| 9.0 | [9.0] 2504.09160 SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow [{'name': 'Qingyuan Wang, Rui Song, Jiaojiao Li, Kerui Cheng, David Ferstl, Yinlin Hu'}] |
3D Object Pose Estimation 物体姿态估计 | v2 6D object pose estimation RGBD images 3D shape constraints |
Input: RGBD frames RGBD帧 Step1: Introduce geometry constraints 介绍几何约束 Step2: Combine rigid-motion and 3D shape prior 结合刚性运动和3D形状先验 Step3: Iterative optimization 迭代优化 Output: Accurate 6D object poses 精确的6D物体姿态 |
| 8.5 | [8.5] 2504.09097 BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting [{'name': 'Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang, Junuk Cha, Soohyun Hwang, Hyein Hwang, Seungryul Baek'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction hand-object interaction Gaussian Splatting |
Input: Monocular RGB video 单目 RGB 视频 Step1: Separate optimization for hand and object Gaussians 单独优化手部和物体高斯 Step2: Joint optimization to consider interactions 共同优化以考虑交互 Output: 3D Gaussians of hands and an unknown object 输出: 手部和未知物体的 3D 高斯 |
| 8.5 | [8.5] 2504.09498 EasyREG: Easy Depth-Based Markerless Registration and Tracking using Augmented Reality Device for Surgical Guidance [{'name': 'Yue Yang, Christoph Leuze, Brian Hargreaves, Bruce Daniel, Fred Baik'}] |
3D Reconstruction and Modeling 三维重建 | v2 markerless registration surgical guidance depth sensing augmented reality |
Input: Depth data from AR device 增强现实设备的深度数据 Step1: Robust point cloud registration 稳健点云注册 Step2: Human-in-the-loop sensor error correction 人为干预的传感器误差修正 Step3: Global alignment with curvature-aware feature sampling 全局对齐与曲率感知特征采样 Step4: Local ICP refinement 局部迭代最近点优化 Output: Accurate anatomical localization and tracking 精确的解剖位置与跟踪 |
| 8.5 | [8.5] 2504.09506 Pillar-Voxel Fusion Network for 3D Object Detection in Airborne Hyperspectral Point Clouds [{'name': 'Yanze Jiang, Yanfeng Gu, Xian Li'}] |
3D Object Detection 3D目标检测 | v2 3D object detection 3D目标检测 hyperspectral point clouds 超光谱点云 feature fusion 特征融合 |
Input: Hyperspectral point clouds (HPCs) 超光谱点云 Step1: Develop pillar-voxel dual-branch encoder 发展柱-体素双分支编码器 Step2: Multi-level feature fusion mechanism for information interaction 多级特征融合机制以增强信息交互 Step3: Validate performance on airborne HPC datasets 在空中HPC数据集上验证性能 Output: Enhanced 3D object detection performance 改进的3D目标检测性能 |
| 8.5 | [8.5] 2504.09540 EmbodiedOcc++: Boosting Embodied 3D Occupancy Prediction with Plane Regularization and Uncertainty Sampler [{'name': 'Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, Shanghang Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D occupancy prediction 3D Gaussian Splatting |
Input: Monocular RGB images 单目RGB图像 Step1: Geometry-guided Refinement Module (GRM) geometry-guided refinement 模块 Step2: Semantic-aware Uncertainty Sampler (SUS) 语义感知不确定性采样器 Step3: Gaussian updates 高斯更新 Output: Improved 3D occupancy predictions 改进的3D占据预测 |
| 8.5 | [8.5] 2504.09623 Ges3ViG: Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding [{'name': 'Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju, Sougata Sen, Sanjay E. Sarma, Archan Misra'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D embodied reference understanding 3D体现引用理解 data augmentation 数据增强 language grounding 语言基础 |
Input: Language description and pointing gesture 语言描述与指向手势 Step1: Data augmentation to insert human avatars into 3D scenes 数据增强以将人类化身插入3D场景中 Step2: Model development for 3D-ERU incorporating human localization 3D-ERU模型开发,结合人类定位 Step3: Dataset curation to create ImputeRefer dataset 数据集整理以创建ImputeRefer数据集 Output: Enhanced model for 3D embodied reference understanding 改进的3D体现引用理解模型 |
| 8.5 | [8.5] 2504.09671 LightHeadEd: Relightable & Editable Head Avatars from a Smartphone [{'name': 'Pranav Manu, Astitva Srivastava, Amit Raj, Varun Jampani, Avinash Sharma, P. J. Narayanan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction head avatars smartphone polarization real-time rendering |
Input: Monocular video streams 单目视频序列 Step1: Capture polarized video streams 捕获极化视频流 Step2: Decompose surface properties 分解表面属性 Step3: Learn head avatar representation 学习头部头像表示 Output: Relightable 3D head avatars 可重光源的三维头部头像 |
| 8.5 | [8.5] 2504.09789 EquiVDM: Equivariant Video Diffusion Models with Temporally Consistent Noise [{'name': 'Chao Liu, Arash Vahdat'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation 3D consistency motion alignment |
Input: Video frames and 3D meshes 视频帧与三维网格 Step1: Generate temporally consistent noise using input video 生成时间一致的噪声以使用输入视频 Step2: Attach noise as textures on 3D meshes 将噪声附加为三维网格上的纹理 Step3: Train video diffusion model with the noise 训练视频扩散模型,使用噪声 Output: Coherent video frames with 3D consistency 输出: 具有三维一致性的连贯视频帧 |
| 8.5 | [8.5] 2504.09953 Efficient 2D to Full 3D Human Pose Uplifting including Joint Rotations [{'name': 'Katja Ludwig, Yuliia Oksymets, Robin Sch\"on, Daniel Kienzle, Rainer Lienhart'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human pose estimation joint rotations sports analytics 2D to 3D uplifting |
Input: 2D keypoints from video frames 视频帧中的2D关键点 Step1: Model design for 2D-to-3D conversion 设计用于2D到3D转换的模型 Step2: Rotation representation selection 选择旋转表示方式 Step3: Evaluation of joint localization and rotation accuracy 评估关节定位和旋转精度 Output: Accurate 3D human poses including joint rotations 输出:包括关节旋转的精确3D人体姿态 |
| 8.5 | [8.5] 2504.10024 Relative Illumination Fields: Learning Medium and Light Independent Underwater Scenes [{'name': 'Mengkun She, Felix Seegr\"aber, David Nakath, Patricia Sch\"ontag, Kevin K\"oser'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields 3D Reconstruction Underwater Imaging |
Input: Images of underwater scenes 水下场景图像 Step 1: Model illumination fields 建立照明场 Step 2: Integrate volumetric representation 整合体积表示 Step 3: Optimize the pipeline 优化整个流程 Output: Photorealistic scene representation 真实感场景表示 |
| 8.5 | [8.5] 2504.10123 M2S-RoAD: Multi-Modal Semantic Segmentation for Road Damage Using Camera and LiDAR Data [{'name': 'Tzu-Yun Tseng, Hongyu Lyu, Josephine Li, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Multi-modal dataset Semantic segmentation Road damage detection LiDAR Camera |
Input: Camera and LiDAR data 摄像头和激光雷达数据 Step1: Data collection 数据收集 Step2: Semantic segmentation algorithms 语义分割算法 Step3: Dataset generation 数据集生成 Output: M2S-RoAD dataset M2S-RoAD数据集 |
| 8.5 | [8.5] 2504.10275 LMFormer: Lane based Motion Prediction Transformer [{'name': 'Harsh Yadav, Maximilian Schaefer, Kun Zhao, Tobias Meisen'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 motion prediction autonomous driving lane-aware transformer |
Input: Dynamic context and static context for trajectory prediction 动态和静态上下文用于轨迹预测 Step1: Lane-aware attention mechanism 车道感知注意机制 Step2: Graph Neural Network-based map encoding 图神经网络基础的地图编码 Step3: Iterative refinement strategies with transformer layers 通过变压器层迭代精化策略 Output: Improved trajectory predictions 改进的轨迹预测 |
| 8.5 | [8.5] 2504.10316 ESCT3D: Efficient and Selectively Controllable Text-Driven 3D Content Generation with Gaussian Splatting [{'name': 'Huiqi Wu, Jianbo Mei, Yingjie Huang, Yining Xu, Jingjiao You, Yilong Liu, Li Yao'}] |
3D Generation 三维生成 | v2 3D generation text-to-3D multi-view integration |
Input: Simple text inputs and additional conditions 简单文本输入和附加条件 Step1: Self-optimization process to refine text prompts 自我优化过程以改善文本提示 Step2: Generate 3D content based on refined prompts 根据改进的提示生成3D内容 Step3: Integrate multi-view information to enhance quality 整合多视角信息以提升质量 Output: High-quality, controllable 3D content 高质量、可控的3D内容 |
| 8.5 | [8.5] 2504.10350 Benchmarking 3D Human Pose Estimation Models Under Occlusions [{'name': 'Filipa Lino, Carlos Santiago, Manuel Marques'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Human Pose Estimation occlusions dataset synthesis |
Input: Multi-camera setups 多摄像头设置 Step1: Dataset synthesis 数据集合成 Step2: Model testing 模型测试 Step3: Performance evaluation 性能评估 Output: Insights on model robustness 模型稳健性见解 |
| 8.5 | [8.5] 2504.10433 MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model [{'name': 'Jian Liu, Wei Sun, Hui Yang, Jin Zheng, Zichen Geng, Hossein Rahmani, Ajmal Mian'}] |
3D Reconstruction and Modeling 三维重建 | v2 object pose estimation monocular diffusion model 3D reconstruction autonomous systems |
Input: Monocular image 单目图像 Step1: Coarse depth estimation 粗略深度估计 Step2: Point cloud generation 点云生成 Step3: Feature fusion 特征融合 Step4: Pose recovery 位置恢复 Output: 9D object pose estimation 9D物体姿态估计 |
| 8.5 | [8.5] 2504.10485 Decoupled Diffusion Sparks Adaptive Scene Generation [{'name': 'Yunsong Zhou, Naisheng Ye, William Ljungbergh, Tianyu Li, Jiazhi Yang, Zetong Yang, Hongzi Zhu, Christoffer Petersson, Hongyang Li'}] |
Image and Video Generation 图像生成 | v2 scene generation autonomous driving data collection |
Input: Scene generation using decoupled noise states 场景生成使用解耦噪声状态 Step1: Implement a noise-masking training strategy 实施噪声掩蔽训练策略 Step2: Simulate complex driving scenarios 模拟复杂驾驶场景 Step3: Integrate goal conditioning with environmental updates 将目标条件与环境更新结合 Output: Realistic and adaptive scene generation 真实的自适应场景生成 |
| 8.5 | [8.5] 2504.10486 DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting [{'name': 'Zeren Jiang, Shaofei Wang, Siyu Tang'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian splatting real-time rendering animatable avatars |
Input: Monocular videos 单目视频 Step1: Knowledge distillation 知识蒸馏 Step2: Geometry and appearance estimation 几何和外观估计 Step3: Shadow computation 阴影计算 Output: Real-time relightable avatars 实时可重光照的虚拟人像 |
| 7.5 | [7.5] 2504.10049 Summarization of Multimodal Presentations with Vision-Language Models: Study of the Effect of Modalities and Structure [{'name': "Th\'eo Gigant, Camille Guinaudeau, Fr\'ed\'eric Dufaux"}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models multimodal presentations automatic summarization |
Input: Multimodal presentations multimodal演示 Step1: Benchmarking VLMs基准测试VLMs Step2: Analysis of input representations输入表示法分析 Step3: Cost and performance evaluation成本和性能评估 Output: Summarized presentations总结的演示 |
| 6.5 | [6.5] 2504.09426 BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning [{'name': 'Shengao Wang, Arjun Chandra, Aoming Liu, Venkatesh Saligrama, Boqing Gong'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models data-efficient pretraining infant learning |
Input: Infant-inspired datasets 婴儿启发的数据集 Step1: Design evaluation tasks 设计评估任务 Step2: Data distillation for synthetic augmentation 数据蒸馏以进行合成增强 Step3: Model training and evaluation 模型训练与评估 Output: Improved VLM performance 改进的 VLM 性能 |
| 6.5 | [6.5] 2504.09724 A Survey on Efficient Vision-Language Models [{'name': 'Gaurav Shinde, Anuradha Ravi, Emon Dey, Shadman Sakib, Milind Rampure, Nirmalya Roy'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 Vision-language models edge devices |
Input: Vision-language models 视觉语言模型 Step1: Review optimization techniques 评估优化技术 Step2: Explore compact architectures 探索紧凑架构 Step3: Analyze performance-memory trade-offs 分析性能和内存的权衡 Output: Efficient VLMs for edge devices 边缘设备的高效视觉语言模型 |
Arxiv 2025-04-14
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.08100 ContrastiveGaussian: High-Fidelity 3D Generation with Contrastive Learning and Gaussian Splatting [{'name': 'Junbang Liu, Enpei Huang, Dongxing Mao, Hui Zhang, Xinyuan Song, Yongxin Ni'}] |
3D Generation 三维生成 | v2 3D generation contrastive learning Gaussian splatting |
Input: Single-view images 单视角图像 Step1: Image upscaling using super-resolution [超分辨率图像处理] Step2: Generate novel perspectives via a 2D diffusion model [通过2D扩散模型生成新视角] Step3: Incorporate contrastive learning with Gaussian splatting [将对比学习与高斯点云结合] Step4: Optimize the model using Quantity-Aware Triplet Loss [使用量化感知三元损失优化模型] Output: Enhanced and consistent 3D models [改进的一致性3D模型] |
| 9.5 | [9.5] 2504.08252 Stereophotoclinometry Revisited [{'name': 'Travis Driver, Andrew Vaughan, Yang Cheng, Adnan Ansar, John Christian, Panagiotis Tsiotras'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction structure from motion photoclinometry |
Input: In-situ imagery 现场图像 Step1: Keypoint detection and matching 关键点检测与匹配 Step2: Integration of photoclinometry in SfM 光照学与运动结构估计的整合 Step3: Simultaneous optimization of parameters 同时优化参数 Output: Enhanced surface models 改进的表面模型 |
| 9.5 | [9.5] 2504.08361 SN-LiDAR: Semantic Neural Fields for Novel Space-time View LiDAR Synthesis [{'name': 'Yi Chen, Tianchen Deng, Wentao Zhao, Xiaoning Wang, Wenqian Xi, Weidong Chen, Jingchuan Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 LiDAR synthesis 3D reconstruction semantic segmentation autonomous driving |
Input: LiDAR point clouds Step1: Coarse-to-fine planar-grid feature extraction Step2: Semantic segmentation using CNN Step3: Joint geometric reconstruction and synthesis Output: Realistic LiDAR scans with semantic labels |
| 9.5 | [9.5] 2504.08410 PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction [{'name': 'Mingzhi Pei, Xu Cao, Xiangyi Wang, Heng Guo, Zhanyu Ma'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction surface normal estimation autonomous driving |
Input: Multi-view surface normal maps 多视角表面法线图 Step1: Utilize geometric constraints from surface normals 使用表面法线的几何约束 Step2: Joint optimization of surface shape and camera poses 表面形状和相机姿态的联合优化 Step3: Evaluation of surface geometry and camera poses 评估表面几何和相机姿态 Output: High-fidelity surface reconstruction 高保真度表面重建 |
| 9.5 | [9.5] 2504.08419 GeoTexBuild: 3D Building Model Generation from Map Footprints [{'name': 'Ruizhe Wang, Junyan Yang, Qiao Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D building generation GeoTexBuild ControlNet Text2Mesh |
Input: Map footprints 地图轮廓 Step1: Height map generation 高度图生成 Step2: Geometry reconstruction 几何重建 Step3: Appearance stylization 外观风格化 Output: 3D building models 3D建筑模型 |
| 9.5 | [9.5] 2504.08675 X2BR: High-Fidelity 3D Bone Reconstruction from a Planar X-Ray Image with Hybrid Neural Implicit Methods [{'name': 'Gokce Guven, H. Fatih Ugurdag, Hasan F. Ates'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction bone modeling neural implicit methods |
Input: Single planar X-ray image 单个平面X射线图像 Step1: Feature extraction using ConvNeXt 特征提取 Step2: Continuous volumetric reconstruction 连续体积重建 Step3: Template-guided non-rigid registration 模板引导的非刚性配准 Output: Anatomically consistent 3D bone volume 解剖学上一致的3D骨骼体积 |
| 9.0 | [9.0] 2504.08280 PNE-SGAN: Probabilistic NDT-Enhanced Semantic Graph Attention Network for LiDAR Loop Closure Detection [{'name': 'Xiong Li, Shulei Liu, Xingning Chen, Yisong Wu, Dong Zhu'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 LiDAR loop closure detection semantic graph SLAM |
Input: LiDAR point cloud data LiDAR点云数据 Step1: Graph construction 生成图结构 Step2: Feature enhancement using NDT 特征增强 Step3: Graph Attention Network processing 使用图注意力网络进行处理 Step4: Probabilistic filtering for loop closure detection 概率滤波进行闭环检测 Output: Enhanced loop closure detection results 提升的闭环检测结果 |
| 9.0 | [9.0] 2504.08412 Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-Training [{'name': 'Chao Qi, Jianqin Yin, Meng Chen, Yingchun Niu, Yuan Sun'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point clouds 3D点云 class-incremental learning 类别增量学习 geometry knowledge 几何知识 |
Input: 3D point clouds 3D点云 Step1: Create a basic shape dataset 创建基本形状数据集 Step2: Pre-train model on geometric knowledge 在几何知识上预训练模型 Step3: Incremental learning framework implementation 增量学习框架实现 Output: Enhanced class-incremental learning capabilities 增强的类别增量学习能力 |
| 8.5 | [8.5] 2504.08125 Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects [{'name': 'Shalini Maiti, Lourdes Agapito, Filippos Kokkinos'}] |
3D Generation 三维生成 | v2 text-to-3D generation evaluation metrics vision large language models |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Feature extraction through vLLMs 通过视觉大型语言模型提取特征 Step3: Quality assessment for 3D objects 3D对象的质量评估 Output: Evaluation scores for generated 3D objects 生成3D对象的评估分数 |
| 8.5 | [8.5] 2504.08154 Investigating Vision-Language Model for Point Cloud-based Vehicle Classification [{'name': 'Yiqiao Li, Jie Wei, Camille Kamga'}] |
Point Cloud Processing 点云处理 | v2 vision-language models point cloud processing autonomous driving |
Input: Point cloud data and LiDAR captures LiDAR数据和点云输入 Step1: Preprocessing pipeline to adapt point cloud for VLM preprocessing管道调整点云以适应VLM Step2: Point cloud registration and classification points cloud注册与分类 Step3: Model evaluation and experimentation 模型评估与实验 Output: Efficient classification results 高效分类结果 |
| 8.5 | [8.5] 2504.08307 DSM: Building A Diverse Semantic Map for 3D Visual Grounding [{'name': 'Qinghongbing Xie, Zijian Liang, Long Zeng'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 3D Visual Grounding 3D视觉定位 Semantic Map 语义地图 Vision-Language Models 视觉语言模型 |
Input: Multi-view images and VLM data 多视角图像和VLM数据 Step1: Construct Diverse Semantic Map (DSM) 构建多样语义地图 Step2: Enhance scene understanding based on DSM 基于DSM增强场景理解 Step3: Implement DSM-Grounding for 3D Visual Grounding 实现DSM-Grounding进行3D视觉定位 Output: Improved performance in robotic tasks 改进机器人任务的表现 |
| 8.5 | [8.5] 2504.08348 Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models [{'name': 'Josef Bengtson, David Nilsson, Fredrik Kahl'}] |
Image Generation 图像生成 | v2 novel view synthesis geometric consistency diffusion models |
Input: Single image and relative pose 单张图像和相对姿态 Step1: Generate candidate image 使用扩散模型生成候选图像 Step2: Compute matching points 计算匹配点 Step3: Formulate geometric consistency loss 构建几何一致性损失 Step4: Optimize noise to minimize loss 优化噪声以最小化损失 Output: Geometrically consistent image 输出: 几何一致性图像 |
| 8.5 | [8.5] 2504.08414 Adversarial Examples in Environment Perception for Automated Driving (Review) [{'name': 'Jun Yan, Huilin Yin'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 adversarial examples automated driving adversarial robustness |
Input: Overview of adversarial examples in deep learning applications for automated driving Step1: Literature review of adversarial robustness and its methods Step2: Analysis of adversarial impact on different tasks in automated driving Step3: Discussion of future directions and research needs in adversarial robustness Output: Comprehensive survey of adversarial examples in automated driving context |
| 8.5 | [8.5] 2504.08473 Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation [{'name': 'Bram Vanherle, Brent Zoomers, Jeroen Put, Frank Van Reeth, Nick Michiels'}] |
3D Reconstruction and Modeling 三维重建 | v2 Gaussian Splatting 3D models synthetic data generation |
Input: Video of the target object 目标对象的视频 Step1: Training Gaussian Splatting model 训练高斯点云模型 Step2: Object extraction from video 从视频中提取对象 Step3: Rendering object onto background 物体渲染到背景上 Output: High-quality synthetic images 生成高质量合成图像 |
| 8.5 | [8.5] 2504.08551 Shadow Erosion and Nighttime Adaptability for Camera-Based Automated Driving Applications [{'name': 'Mohamed Sabry, Gregory Schroeder, Joshua Varughese, Cristina Olaverri-Monreal'}] |
Image Generation 图像生成 | v2 image enhancement autonomous driving shadow mitigation nighttime visibility |
Input: Images from RGB cameras RGB相机的图像 Step1: Apply Shadow Erosion to reduce shadows 应用阴影侵蚀以减少阴影 Step2: Implement Nighttime Adaptability for improved visibility 实施夜间适应性以提高可见性 Step3: Evaluate using visual perception quality metrics 使用视觉感知质量指标进行评估 Output: Enhanced images for autonomous driving applications 输出:用于自动驾驶应用的增强图像 |
| 8.5 | [8.5] 2504.08581 FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents [{'name': 'Xin Tan, Yuzhou Ji, He Zhu, Yuan Xie'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting 3D scene modeling language-embedded radiance fields |
Input: Posed images 设定图像 Step1: Extract SAM masks 提取SAM掩码 Step2: Filter redundant masks 过滤冗余掩码 Step3: Semantic mapping 语义映射 Output: Part-level localization results 部件级定位结果 |
| 8.5 | [8.5] 2504.08736 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation [{'name': 'Tianwei Xiong, Jun Hao Liew, Zilong Huang, Jiashi Feng, Xihui Liu'}] |
Image Generation 图像生成 | v2 image reconstruction autoregressive generation tokenizers semantic regularization |
Input: Visual tokenizers 视觉标记器 Step1: Identifying latent space complexity 确定潜在空间复杂性 Step2: Proposing semantic regularization 提出语义正则化 Step3: Scaling tokenizers with key practices 采用关键实践扩大标记器 Output: Enhanced image reconstruction and generation 改进的图像重建与生成 |
| 8.0 | [8.0] 2504.08452 Road Grip Uncertainty Estimation Through Surface State Segmentation [{'name': 'Jyri Maanp\"a\"a, Julius Pesonen, Iaroslav Melekhov, Heikki Hyyti, Juha Hyypp\"a'}] |
Autonomous Driving 自动驾驶 | v2 Grip Uncertainty Prediction 抓地力不确定性预测 Autonomous Driving 自动驾驶 Surface State Segmentation 表面状态分割 |
Input: Road surface state segmentation strategy 路面状态分割策略 Step1: Benchmark uncertainty prediction methods 基准不确定性预测方法 Step2: Estimate pixel-wise grip probability distribution 估计逐像素的抓地力概率分布 Step3: Evaluate robustness of predictions 评估预测的稳健性 Output: Enhanced grip uncertainty predictions 改进的抓地力不确定性预测 |
| 8.0 | [8.0] 2504.08540 Datasets for Lane Detection in Autonomous Driving: A Comprehensive Review [{'name': 'J\"org Gamerdinger, Sven Teufel, Oliver Bringmann'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 lane detection autonomous driving datasets |
Input: Lane detection datasets 车道检测数据集 Step1: Comprehensive review of datasets 数据集的综合评审 Step2: Classification based on key factors 基于关键因素的分类 Step3: Identification of challenges and gaps 挑战和研究空白的识别 Output: Recommendations for dataset improvement 数据集改进建议 |
| 7.5 | [7.5] 2504.08422 CMIP-CIL: A Cross-Modal Benchmark for Image-Point Class Incremental Learning [{'name': 'Chao Qi, Jianqin Yin, Ren Zhang'}] |
Image and Video Generation 图像生成 | v2 incremental learning cross-modal learning 3D vision |
Input: 2D images and 3D point clouds 2D图像和3D点云 Step1: Generating masked point clouds 生成遮罩点云 Step2: Creating multi-view images 生成多视图图像 Step3: Contrastive learning framework 对比学习框架 Output: Generalizable image-point correspondence 输出:可推广的图像点对应关系 |
Arxiv 2025-04-11
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.07335 DLTPose: 6DoF Pose Estimation From Accurate Dense Surface Point Estimates [{'name': 'Akash Jadhav, Michael Greenspan'}] |
3D Object Pose Estimation 物体姿态估计 | v2 6DoF pose estimation RGB-D images 3D surface estimation |
Input: RGB-D images RGB-D图像 Step1: Predict per-pixel radial distances 为每个像素点预测径向距离 Step2: Use Direct Linear Transform for 3D surface estimation 使用直接线性变换进行3D表面估计 Step3: Keypoint ordering for symmetry handling 处理对称性关键点排序 Output: Accurate 6DoF object pose estimation 输出:准确的6自由度对象姿态估计 |
| 9.5 | [9.5] 2504.07370 View-Dependent Uncertainty Estimation of 3D Gaussian Splatting [{'name': 'Chenyu Han, Corentin Dumery'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction uncertainty estimation Gaussian Splatting |
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据 Step1: Model uncertainty as view-dependent features 将不确定性建模为视角依赖特征 Step2: Use spherical harmonics for uncertainty representation 使用球谐函数表示不确定性 Step3: Integrate into traditional 3DGS pipeline 集成到传统3DGS流程中 Output: Improved uncertainty estimation for 3D reconstruction 改进的3D重建不确定性估计 |
| 9.5 | [9.5] 2504.07524 DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction [{'name': 'Xu Zhao, Pengju Zhang, Bo Liu, Yihong Wu'}] |
3D Occupancy Prediction 3D占用预测 | v2 3D occupancy prediction 3D占用预测 autonomous driving 自动驾驶 depth context features 深度上下文特征 |
Input: 2D images and prior depth maps 2D图像和先前深度图 Step1: Extract depth context features 提取深度上下文特征 Step2: Develop the Global Query-based Module 开发全局查询模块 Step3: Apply Hierarchical Supervision Strategy 应用分层监督策略 Output: Monocular 3D occupancy predictions 单目3D占用预测 |
| 9.5 | [9.5] 2504.07943 HoloPart: Generative 3D Part Amodal Segmentation [{'name': 'Yunhan Yang, Yuan-Chen Guo, Yukun Huang, Zi-Xin Zou, Zhipeng Yu, Yangguang Li, Yan-Pei Cao, Xihui Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D part segmentation shape completion 3D reconstruction |
Input: Incomplete part segments 不完整的部分段 Step1: Initial part segmentation 初始部分分割 Step2: HoloPart diffusion-based model application 应用HoloPart扩散模型 Output: Complete 3D parts 完整的三维部分 |
| 9.5 | [9.5] 2504.07958 Detect Anything 3D in the Wild [{'name': 'Hanxue Zhang, Haoran Jiang, Qingsong Yao, Yanan Sun, Renrui Zhang, Hao Zhao, Hongyang Li, Hongzi Zhu, Zetong Yang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D detection zero-shot learning autonomous driving |
Input: Monocular images 单目图像 Step1: Feature alignment 特征对齐 Step2: Knowledge transfer 知识转移 Step3: Model evaluation 模型评估 Output: Generalized 3D detection results 泛化的3D检测结果 |
| 9.5 | [9.5] 2504.07961 Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction [{'name': 'Zeren Jiang, Chuanxia Zheng, Iro Laina, Diane Larlus, Andrea Vedaldi'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D reconstruction monocular video video generators dynamic scenes |
Input: Monocular video videos 单目视频 Step1: Train using synthetic data using video diffusion models 使用合成数据训练视频扩散模型 Step2: Predict geometric modalities including point, disparity, and ray maps 预测几何模型,包括点图、视差图和光线图 Step3: Multi-modal alignment and fusion at inference time 推理时的多模式对齐与融合 Output: 4D reconstruction of dynamic scenes 4D动态场景重建 |
| 9.2 | [9.2] 2504.07853 V2V3D: View-to-View Denoised 3D Reconstruction for Light-Field Microscopy [{'name': 'Jiayin Zhao, Zhenqi Fu, Tao Yu, Hui Qiao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction light field microscopy denoising wave optics |
Input: Light field images 光场图像 Step1: Framework for simultaneous denoising and 3D reconstruction 同时去噪与三维重建框架 Step2: View-to-view paired images processing 视图对视图配对图像处理 Step3: Feature alignment using wave-optics-based technique 使用波光学基础上的特征对齐技术 Output: High-quality 3D reconstructed volumes 高质量3D重建体积 |
| 9.0 | [9.0] 2504.07334 Objaverse++: Curated 3D Object Dataset with Quality Annotations [{'name': 'Chendi Lin, Heshan Liu, Qunshu Lin, Zachary Bright, Shitao Tang, Yihui He, Minghao Liu, Ling Zhu, Cindy Le'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction 三维重建 quality annotation 质量注释 generative models 生成模型 |
Input: Annotated 3D object dataset 经过注释的三维物体数据集 Step1: Manual annotation of 10,000 objects 对1万件物品进行手动注释 Step2: Training of a neural network to automate tagging 训练神经网络以自动标记 Step3: Evaluation of datasets based on quality attributes 根据质量属性评估数据集 Output: Enhanced dataset of 500,000 3D models 改进的50万三维模型数据集 |
| 8.5 | [8.5] 2504.07260 Quantifying Epistemic Uncertainty in Absolute Pose Regression [{'name': 'Fereidoon Zangeneh, Amit Dekel, Alessandro Pieropan, Patric Jensfelt'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 absolute pose regression visual localization uncertainty estimation |
Input: Image data 影像数据 Step1: Train absolute pose regression model 训练绝对姿态回归模型 Step2: Quantify epistemic uncertainty 量化认识不确定性 Step3: Validate predictions 验证预测 Output: Confidence measures in predictions 预测中的置信度量度 |
| 8.5 | [8.5] 2504.07375 Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction [{'name': 'Junyi Ma, Wentao Bao, Jingyi Xu, Guanzhong Sun, Xieyuanli Chen, Hesheng Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D hand trajectory prediction multimodal learning robot manipulation autonomous systems |
Input: Multimodal data including 2D RGB images and 3D point clouds 多模态数据包含2D RGB图像和3D点云 Step1: Data processing to extract features from each modality 数据处理以提取每种模态的特征 Step2: Integration of multimodal features using a hybrid Mamba-Transformer module 使用混合Mamba-Transformer模块集成多模态特征 Step3: Prediction of future hand trajectories and camera egomotion 预测未来手部轨迹和相机自运动 Output: Future 3D hand trajectories and corresponding egomotion 未来的3D手部轨迹和相应的自运动 |
| 8.5 | [8.5] 2504.07382 Model Discrepancy Learning: Synthetic Faces Detection Based on Multi-Reconstruction [{'name': 'Qingchao Jiang, Zhishuo Xu, Zhiying Zhu, Ning Chen, Haoyue Wang, Zhongjie Ba'}] |
3D Reconstruction and Modeling 三维重建 | v2 synthetic face detection reconstruction discrepancies |
Input: Multi-reconstruction of synthetic images 多重重建合成图像 Step1: Analyze reconstruction discrepancies 分析重建差异 Step2: Develop a Multi-Reconstruction-based detector 开发基于多重重建的检测器 Step3: Evaluate detection performance 评估检测性能 Output: Accurate differentiation between real and synthetic faces 输出: 精确区分真实和合成面孔 |
| 8.5 | [8.5] 2504.07418 ThermoStereoRT: Thermal Stereo Matching in Real Time via Knowledge Distillation and Attention-based Refinement [{'name': 'Anning Hu, Ang Li, Xirui Jin, Danping Zou'}] |
Stereo Vision 立体视觉 | v2 thermal stereo matching 3D reconstruction autonomous systems |
Input: Rectified thermal stereo images 热成像立体图像 Step1: Feature extraction 特征提取 Step2: Cost volume construction 成本体积构建 Step3: Disparity estimation 视差估计 Step4: Disparity map refinement 视差图修正 Output: Final disparity map 最终视差图 |
| 8.5 | [8.5] 2504.07491 Kimi-VL Technical Report [{'name': 'Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, Congcong Wang, Dehao Zhang, Dikang Du, Dongliang Wang, Enming Yuan, Enzhe Lu, Fang Li, Flood Sung, Guangda Wei, Guokun Lai, Han Zhu, Hao Ding, Hao Hu, Hao Yang, Hao Zhang, Haoning Wu, Haotian Yao, Haoyu Lu, Heng Wang, Hongcheng Gao, Huabin Zheng, Jiaming Li, Jianlin Su, Jianzhou Wang, Jiaqi Deng, Jiezhong Qiu, Jin Xie, Jinhong Wang, Jingyuan Liu, Junjie Yan, Kun Ouyang, Liang Chen, Lin Sui, Longhui Yu, Mengfan Dong, Mengnan Dong, Nuo Xu, Pengyu Cheng, Qizheng Gu, Runjie Zhou, Shaowei Liu, Sihan Cao, Tao Yu, Tianhui Song, Tongtong Bai, Wei Song, Weiran He, Weixiao Huang, Weixin Xu, Xiaokun Yuan, Xingcheng Yao, Xingzhe Wu, Xinxing Zu, Xinyu Zhou, Xinyuan Wang, Y. Charles, Yan Zhong, Yang Li, Yangyang Hu, Yanru Chen, Yejie Wang, Yibo Liu, Yibo Miao, Yidao Qin, Yimin Chen, Yiping Bao, Yiqin Wang, Yongsheng Kang, Yuanxin Liu, Yulun Du, Yuxin Wu, Yuzhi Wang, Yuzi Yan, Zaida Zhou, Zhaowei Li, Zhejun Jiang, Zheng Zhang, Zhilin Yang, Zhiqi Huang, Zihao Huang, Zijia Zhao, Ziwei Chen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language model multimodal reasoning long-context processing |
Input: Vision-language inputs 视觉-语言输入 Step1: MoE model design MoE模型设计 Step2: Long-context processing 长上下文处理 Step3: Multimodal reasoning 多模态推理 Output: Advanced VLM capabilities 高级VLM能力 |
| 8.5 | [8.5] 2504.07603 RASMD: RGB And SWIR Multispectral Driving Dataset for Robust Perception in Adverse Conditions [{'name': 'Youngwan Jin, Michal Kovac, Yagiz Nalcakan, Hyeongjin Ju, Hanbin Song, Sanghyeop Yeo, Shiho Kim'}] |
Autonomous Driving 自动驾驶 | v2 RGB SWIR autonomous driving dataset object detection |
Input: RGB and SWIR image pairs RGB和SWIR图像对 Step1: Dataset collection 数据集收集 Step2: Annotation for object detection and translation 对对象检测和翻译的注释 Step3: Experimental evaluation 实验评估 Output: Benchmark for multispectral driving dataset 多光谱驾驶数据集基准 |
| 8.5 | [8.5] 2504.07615 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model [{'name': 'Haozhan Shen, Peng Liu, Jingcheng Li, Chunxin Fang, Yibo Ma, Jiajia Liao, Qiaoli Shen, Zilun Zhang, Kangjia Zhao, Qianqian Zhang, Ruochen Xu, Tiancheng Zhao'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models (VLMs) 视觉语言模型 Reinforcement Learning 强化学习 |
Input: Vision-language tasks 视觉语言任务 Step1: Rule-based reward formulation 基于规则的奖励制定 Step2: Model training with reinforcement learning 使用强化学习进行模型训练 Step3: Performance evaluation in visual tasks 视觉任务中的性能评估 Output: Improved VLM performance 改进的视觉语言模型性能 |
| 8.5 | [8.5] 2504.07949 InteractAvatar: Modeling Hand-Face Interaction in Photorealistic Avatars with Deformable Gaussians [{'name': 'Kefan Chen, Sergiu Oprea, Justin Theiss, Sreyas Mohan, Srinath Sridhar, Aayush Prakash'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D modeling hand-face interaction Avatar realistic animation Gaussian Splatting |
Input: Monocular or multi-view videos 单目或多视角视频 Step1: Gaussian kernel anchoring 高斯核锚定 Step2: Pose-dependent animation 动态动画依赖于姿势 Step3: Interaction modeling 互动建模 Output: Photorealistic avatar animation 照片级真实的虚拟人动画 |
| 8.5 | [8.5] 2504.07955 BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation [{'name': 'Yuanhong Yu, Xingyi He, Chen Zhao, Junhao Yu, Jiaqi Yang, Ruizhen Hu, Yujun Shen, Xing Zhu, Xiaowei Zhou, Sida Peng'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D object pose estimation 3D物体姿态估计 sparse-view reconstruction 稀疏视图重建 |
Input: Sparse-view RGB images 稀疏视图RGB图像 Step1: Recover 3D bounding box from sparse views 从稀疏视图恢复3D边界框 Step2: Predict 2D projections of the bounding box corners in the query view 在查询视图中预测边界框角点的2D投影 Output: 6DoF object pose estimation 6自由度物体姿态估计 |
| 7.5 | [7.5] 2504.07542 SydneyScapes: Image Segmentation for Australian Environments [{'name': 'Hongyu Lyu, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}] |
Autonomous Driving 自动驾驶 | v2 image segmentation autonomous vehicles dataset machine learning |
Input: Collection of urban images from Sydney 澳大利亚悉尼的城市图像 Step1: Image segmentation task definition 图像分割任务定义 Step2: Annotation of images with semantic, instance, and panoptic labels 等图像进行语义、实例和全景标签注释 Step3: Benchmarking with state-of-the-art algorithms 基于最新算法进行基准测试 Output: Dataset for AV perception algorithm development 自动驾驶感知算法开发的数据集 |
Arxiv 2025-04-10
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.06716 GSta: Efficient Training Scheme with Siestaed Gaussians for Monocular 3D Scene Reconstruction [{'name': 'Anil Armagan, Albert Sa\`a-Garriga, Bruno Manganelli, Kyuwon Kim, M. Kerim Yucel'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting efficiency autonomous driving |
Input: Monocular images 单目图像 Step1: Gaussian identification 高斯识别 Step2: Freezing converged Gaussians 冻结收敛高斯 Step3: Early stopping mechanism 提早停止机制 Output: Efficiently trained 3D reconstruction model 高效训练的3D重建模型 |
| 9.5 | [9.5] 2504.06719 Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding [{'name': 'Pedro Hermosilla, Christian Stippel, Leon Sick'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene understanding 3D场景理解 self-supervised learning 自监督学习 masked modeling 掩模建模 |
Input: Hierarchical 3D models 分层的3D模型 Step1: Multi-resolution feature sampling 多分辨率特征采样 Step2: Hierarchical masking approach 分层掩码方法 Step3: Feature reconstruction 特征重建 Output: Semantic-aware 3D features 语义感知的3D特征 |
| 9.5 | [9.5] 2504.06801 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection [{'name': 'Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Kundu, R. Venkatesh Babu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D object detection 3D物体检测 data augmentation 数据增广 monocular detection 单目检测 |
Input: Background scene 背景场景 Step1: Learn distribution of plausible 3D bounding boxes 学习合理的三维边界框的分布 Step2: Render realistic objects 渲染真实的物体 Step3: Place objects according to learned distribution 根据学习的分布放置物体 Output: Enhanced monocular 3D detection performance 改进的单目3D检测性能 |
| 9.5 | [9.5] 2504.06815 SVG-IR: Spatially-Varying Gaussian Splatting for Inverse Rendering [{'name': 'Hanxiao Sun, YuPeng Gao, Jin Xie, Jian Yang, Beibei Wang'}] |
3D Reconstruction 三维重建 | v2 inverse rendering 3D Gaussian Splatting novel view synthesis relighting |
Input: Images for 3D asset reconstruction 用于三维资产重建的图像 Step1: Apply Spatially-varying Gaussian representation 应用空间变化高斯表示 Step2: Integrate physically-based indirect lighting model 集成基于物理的间接照明模型 Step3: Evaluate NVS and relighting quality 评估新视角合成和重光照优化 Output: Enhanced rendering quality 改进的渲染质量 |
| 9.5 | [9.5] 2504.06827 IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [{'name': 'Can Zhang, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Reconstruction 三维重建 Interactive Affordance 交互赋能 Articulated Objects 有关节的物体 |
Input: Multi-view posed images 多视角图像 Step1: 3D model construction 3D模型构建 Step2: Hierarchical feature field construction 层次特征场构建 Step3: Semantic-guided mask association across states 语义引导的掩码关联 Step4: Affordance prediction 赋能预测 Step5: Motion recovery 运动恢复 Output: Interactive affordance system 可交互的赋能系统 |
| 9.5 | [9.5] 2504.06978 Wheat3DGS: In-field 3D Reconstruction, Instance Segmentation and Phenotyping of Wheat Heads with Gaussian Splatting [{'name': 'Daiwei Zhang, Joaquin Gajardo, Tomislav Medic, Isinsu Katircioglu, Mike Boss, Norbert Kirchgessner, Achim Walter, Lukas Roth'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction instance segmentation phenotyping Gaussian Splatting |
Input: Multi-view RGB images 多视角RGB图像 Step1: Data integration 数据集成 Step2: Instance segmentation using Segment Anything Model (SAM) 基于SAM的实例分割 Step3: 3D reconstruction using 3D Gaussian Splatting 采用3D高斯点云进行3D重建 Output: Detailed 3D models of wheat heads 改进的小麦头三维模型 |
| 9.5 | [9.5] 2504.06982 SIGMAN:Scaling 3D Human Gaussian Generation with Millions of Assets [{'name': 'Yuhang Yang, Fengqi Liu, Yixing Lu, Qin Zhao, Pingyu Wu, Wei Zhai, Ran Yi, Yang Cao, Lizhuang Ma, Zheng-Jun Zha, Junting Dong'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction 3D human generation Gaussian modeling |
Input: Multi-view images 多视角图像 Step1: Latent space compression 潜在空间压缩 Step2: Gaussian representation generation 高斯表示生成 Step3: Large-scale dataset construction 大规模数据集构建 Output: High-quality 3D human Gaussians 高质量3D人类高斯模型 |
| 9.5 | [9.5] 2504.07025 Glossy Object Reconstruction with Cost-effective Polarized Acquisition [{'name': 'Bojian Wu, Yifan Peng, Ruizhen Hu, Xiaowei Zhou'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction polarization imaging neural rendering |
Input: Multi-view polarization images 多视角偏振图像 Step1: Data acquisition 数据采集 Step2: Modeling polarimetric BRDF using neural implicit fields 使用神经隐式场建模偏振BRDF Step3: Minimizing rendering loss 最小化渲染损失 Output: High-fidelity geometry and radiance decomposition 高保真几何体和辐射分解 |
| 9.2 | [9.2] 2504.06397 PromptHMR: Promptable Human Mesh Recovery [{'name': 'Yufu Wang, Yu Sun, Priyanka Patel, Kostas Daniilidis, Michael J. Black, Muhammed Kocabas'}] |
3D Reconstruction and Modeling 三维重建 | v2 human pose estimation 3D shape recovery |
Input: Images containing people 处理包含人物的图像 Step1: Utilize bounding boxes or masks 利用边界框或掩模 Step2: Extract features using vision transformer 使用视觉变换器提取特征 Step3: Process prompts and image data 处理提示和图像数据 Output: Estimated human pose and shape 估计的人体姿态和形状 |
| 8.5 | [8.5] 2504.06292 Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction [{'name': 'Hongbin Liang, Hezhe Qiao, Wei Huang, Qizhou Wang, Mingsheng Shang, Lin Chen'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 pedestrian crossing intention autonomous driving temporal contextual learning |
Input: Observed video frames 观察到的视频帧 Step1: Temporal merging to cluster key events 时间聚合以聚类关键事件 Step2: Contextual attention to aggregate features 上下文注意力以聚合特征 Output: Enhanced pedestrian crossing intent prediction 改进的行人过马路意图预测 |
| 8.5 | [8.5] 2504.06464 Implementation of a Zed 2i Stereo Camera for High-Frequency Shoreline Change and Coastal Elevation Monitoring [{'name': "Jos\'e A. Pilartes-Congo, Matthew Kastl, Michael J. Starek, Marina Vicens-Miquel, Philippe Tissot"}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction coastal monitoring |
Input: Multi-view images 多视角图像 Step1: Intrinsic camera calibration 相机内在参数校准 Step2: Georectification and registration of acquired imagery and point cloud 获取图像和点云的几何校正与配准 Step3: Generation of Digital Surface Models (DSM) 生成数字表面模型(DSM) Output: 3D point cloud and georectified imagery 3D点云和几何校正图像 |
| 8.5 | [8.5] 2504.06527 TSP-OCS: A Time-Series Prediction for Optimal Camera Selection in Multi-Viewpoint Surgical Video Analysis [{'name': 'Xinyu Liu, Xiaoguang Lin, Xiang Liu, Yong Yang, Hongqian Wang, Qilong Sun'}] |
Multi-view and Stereo Vision 多视角立体视觉 | v2 multi-viewpoint camera selection surgical video analysis |
Input: Multi-view surgical videos 多视角手术视频 Step1: Feature extraction features extraction 特征提取 Step2: Time series prediction 时间序列预测 Step3: Camera selection camera selection 相机选择 Output: Optimal camera views 最优相机视角 |
| 8.5 | [8.5] 2504.06620 InstantSticker: Realistic Decal Blending via Disentangled Object Reconstruction [{'name': 'Yi Zhang, Xiaoyang Huang, Yishun Dou, Yue Shi, Rui Shi, Ye Chen, Bingbing Ni, Wenjun Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 decal blending 3D reconstruction real-time rendering |
Input: Multi-view images 多视角图像 Step1: Decal blending preparation 贴花合成准备 Step2: Shadow factor integration 阴影因子集成 Step3: ARAP parameterization 优化参数化 Output: High-quality decal blending outputs 高质量贴花合成结果 |
| 8.5 | [8.5] 2504.06627 FACT: Multinomial Misalignment Classification for Point Cloud Registration [{'name': "Ludvig Dill\'en, Per-Erik Forss\'en, Johan Edstedt"}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Registration 点云注册 Alignment Quality Prediction 对齐质量预测 Multinomial Misalignment Classification 多项式对齐分类 |
Input: Registered lidar point cloud pairs 注册的激光雷达点云对 Step1: Feature extraction 特征提取 Step2: Processing with point transformer-based network 使用基于点的变换网络进行处理 Step3: Multinomial misalignment classification 多项式对齐分类 Output: Misalignment class prediction 预测对齐误差类别 |
| 8.5 | [8.5] 2504.06638 HGMamba: Enhancing 3D Human Pose Estimation with a HyperGCN-Mamba Network [{'name': 'Hu Cui, Tessai Hayama'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human pose estimation Hyper-GCN Mamba networks |
Input: 2D human pose data 2D 人体姿态数据 Step1: Model local structures 模型局部结构 Step2: Model global dependencies 模型全局依赖 Step3: Adaptive fusion 自适应融合 Output: 3D human pose estimates 3D 人体姿态估计 |
| 8.5 | [8.5] 2504.06647 Uni-PrevPredMap: Extending PrevPredMap to a Unified Framework of Prior-Informed Modeling for Online Vectorized HD Map Construction [{'name': 'Nan Peng, Xun Zhou, Mingming Wang, Guisong Chen, Songming Chen'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 autonomous driving HD maps |
Input: Previous predictions and simulated outdated HD maps 先前预测和模拟过时的高清地图 Step1: Framework development 框架设计 Step2: Efficient data processing and retrieval 效率数据处理与检索 Step3: Model validation and performance evaluation 模型验证与性能评估 Output: Enhanced online vectorized HD maps 改进的在线矢量高清地图 |
| 8.5 | [8.5] 2504.06742 nnLandmark: A Self-Configuring Method for 3D Medical Landmark Detection [{'name': 'Alexandra Ertl, Shuhan Xiao, Stefan Denner, Robin Peretzke, David Zimmerer, Peter Neher, Fabian Isensee, Klaus Maier-Hein'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D landmark detection nnU-Net medical imaging |
Input: 3D medical images 3D医学图像 Step1: Adapt nnU-Net for landmarks 使用nnU-Net进行地标适配 Step2: Perform heatmap-based regression 进行基于热图的回归 Step3: Model evaluation and validation 模型评估与验证 Output: Accurate 3D landmark detection 准确的3D地标检测 |
| 8.5 | [8.5] 2504.06803 DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation [{'name': 'Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Hao Luo, Yibing Song, Gao Huang, Fan Wang, Yang You'}] |
Image and Video Generation 图像和视频生成 | v2 visual generation 视觉生成 Diffusion Transformers 扩散变换器 computational efficiency 计算效率 |
Input: Visual generation tasks 视觉生成任务 Step1: Dynamic computation adjustment 动态计算调整 Step2: Implementing TDW and SDT strategies 实施TDW和SDT策略 Step3: Integrating with existing diffusion models 与现有扩散模型的整合 Output: Efficient visual generation efficient visual generation |
| 8.5 | [8.5] 2504.06863 MovSAM: A Single-image Moving Object Segmentation Framework Based on Deep Thinking [{'name': 'Chang Nie, Yiqing Xu, Guangming Wang, Zhe Liu, Yanzi Miao, Hesheng Wang'}] |
Robotic Perception 机器人感知 | v2 Moving Object Segmentation Deep Learning Autonomous Driving Machine Learning |
Input: Single images 单幅图像 Step1: Generate text prompts from Multimodal Large Language Model (MLLM) 使用多模态大型语言模型生成文本提示 Step2: Segment moving objects using Segment Anything Model (SAM) and Vision-Language Model (VLM) 使用SAM和VLM进行移动对象分割 Step3: Implement a deep thinking loop to refine segmentation results 实施深度思维循环以优化分割结果 Output: Segmented moving objects 输出分割的移动对象 |
| 8.5 | [8.5] 2504.06920 S-EO: A Large-Scale Dataset for Geometry-Aware Shadow Detection in Remote Sensing Applications [{'name': "Masquil El\'ias, Mar\'i Roger, Ehret Thibaud, Meinhardt-Llopis Enric, Mus\'e Pablo, Facciolo Gabriele"}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction shadow detection remote sensing |
Input: Multi-date, multi-angle satellite imagery 多日期多角度卫星影像 Step1: Data collection and annotation 数据收集与注释 Step2: Training of shadow detection model 阴影检测模型训练 Step3: Integration with 3D reconstruction models 与三维重建模型集成 Output: Improved shadow detection and 3D model quality 改进的阴影检测和三维模型质量 |
| 8.5 | [8.5] 2504.06925 Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition [{'name': "Sergio Romero-Tapiador, Ruben Tolosana, Blanca Lacruz-Pleguezuelos, Laura Judith Marcos Zambrano, Guadalupe X. Baz\'an, Isabel Espinosa-Salinas, Julian Fierrez, Javier Ortega-Garcia, Enrique Carrillo de Santa Pau, Aythami Morales"}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models food image recognition dietary assessment |
Input: Food images 食品图像 Step1: Database creation 数据库创建 Step2: Model evaluation 模型评估 Step3: Comparison of VLMs with expert annotations 与专家注释的VLM比较 Output: Food recognition results 食品识别结果 |
| 8.5 | [8.5] 2504.07093 FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution [{'name': 'Gene Chou, Wenqi Xian, Guandao Yang, Mohamed Abdelfattah, Bharath Hariharan, Noah Snavely, Ning Yu, Paul Debevec'}] |
Depth Estimation 深度估计 | v2 depth estimation real-time processing video analysis |
Input: Streaming video at 2K resolution 2K分辨率视频 Step1: Preprocess video frames 预处理视频帧 Step2: Depth estimation using modified pretrained model 使用修改后的预训练模型进行深度估计 Step3: Alignment of depth features 对深度特征进行对齐 Output: High-resolution depth maps 输出高分辨率深度图 |
| 7.0 | [7.0] 2504.06835 LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding [{'name': 'Ziyi Wang, Haoran Wu, Yiming Rong, Deyang Jiang, Yixin Zhang, Yunlong Zhao, Shuang Xu, Bo XU'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models long video understanding |
Input: Short video-text pairs 短视频-文本对 Step1: Video compression video compression 视频压缩 Step2: Model enhancement 模型增强 Output: Improved VLM performance 改进的VLM性能 |
Arxiv 2025-04-09
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.05400 GARF: Learning Generalizable 3D Reassembly for Real-World Fractures [{'name': 'Sihang Li, Zeyu Jiang, Grace Chen, Chenyang Xu, Siqi Tan, Xue Wang, Irving Fang, Kristof Zyskowski, Shannon P. McPherron, Radu Iovita, Chen Feng, Jing Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reassembly 三维重组 fracture 断裂 dataset 数据集 |
Input: Various fractured 3D objects 各种破碎的三维物体 Step1: Fracture-aware feature learning 破碎感知特征学习 Step2: Flow matching for alignment 对齐的流匹配 Step3: One-step preassembly for robustness 一步预组装以提高鲁棒性 Output: Reassembled 3D models 重新组装的三维模型 |
| 9.5 | [9.5] 2504.05649 POD: Predictive Object Detection with Single-Frame FMCW LiDAR Point Cloud [{'name': 'Yining Shi, Kun Jiang, Xin Zhao, Kangan Qian, Chuchu Xie, Tuopu Wen, Mengmeng Yang, Diange Yang'}] |
3D Object Detection 3D物体检测 | v2 3D object detection FMCW LiDAR autonomous driving |
Input: Single-frame FMCW LiDAR point cloud Step1: Generate virtual future point using ray casting Step2: Create virtual two-frame point clouds Step3: Encode with a sparse 4D encoder Output: Predictive object detection results |
| 9.5 | [9.5] 2504.05698 Point-based Instance Completion with Scene Constraints [{'name': 'Wesley Khademi, Li Fuxin'}] |
3D Reconstruction and Modeling 三维重建 | v2 point cloud 3D reconstruction scene completion autonomous systems |
Input: Partial point clouds of objects 场景中物体的部分点云 Step1: Seed generation 生成种子点 Step2: Scene constraints integration 场景约束集成 Step3: Instance completion 模型完成 Output: Completed 3D objects 完成的三维对象 |
| 9.5 | [9.5] 2504.05720 QEMesh: Employing A Quadric Error Metrics-Based Representation for Mesh Generation [{'name': 'Jiaqi Li, Ruowei Wang, Yu Liu, Qijun Zhao'}] |
3D Generation 三维生成 | v2 3D reconstruction mesh generation Quadric Error Metrics |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Algorithm development 算法开发 Step3: Model evaluation 模型评估 Output: Enhanced 3D models 改进的三维模型 |
| 9.5 | [9.5] 2504.05751 InvNeRF-Seg: Fine-Tuning a Pre-Trained NeRF for 3D Object Segmentation [{'name': 'Jiangsan Zhao, Jakob Geipel, Krzysztof Kusnierek, Xuean Cui'}] |
3D Segmentation 3D分割 | v2 Neural Radiance Fields 3D segmentation fine-tuning |
Input: Multi-view RGB images and 2D segmentation masks 多视角RGB图像和2D分割掩膜 Step1: Train standard NeRF on RGB images 使用RGB图像训练标准NeRF Step2: Fine-tune using 2D segmentation masks using the same NeRF architecture 使用相同的NeRF架构对2D分割掩膜进行微调 Output: Segmented 3D point clouds 输出:分割的3D点云 |
| 9.5 | [9.5] 2504.06178 Flash Sculptor: Modular 3D Worlds from Objects [{'name': 'Yujia Hu, Songhua Liu, Xingyi Yang, Xinchao Wang'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction scene generation modular objects image-to-3D |
Input: Single image 单幅图像 Step1: Decouple tasks 任务分解 Step2: Estimate parameters 估计参数 Step3: Generate 3D scene 生成三维场景 Output: Compositional 3D scene 组合三维场景 |
| 9.5 | [9.5] 2504.06210 HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation [{'name': 'Yiming Liang, Tianhan Xu, Yuta Kikuchi'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction monocular videos |
Input: Monocular video 单目视频 Step1: Motion decomposition 运动分解 Step2: Hierarchical representation design 层次表示设计 Step3: Gaussian deformation adjustment 高斯变形调整 Output: Enhanced dynamic 3D model 改进的动态三维模型 |
| 9.5 | [9.5] 2504.06264 D^2USt3R: Enhancing 3D Reconstruction with 4D Pointmaps for Dynamic Scenes [{'name': 'Jisang Han, Honggyu An, Jaewoo Jung, Takuya Narihira, Junyoung Seo, Kazumi Fukuda, Chaehyun Kim, Sunghwan Hong, Yuki Mitsufuji, Seungryong Kim'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction dynamic scenes 4D pointmaps |
Input: Multi-view images and dynamic scene data 多视角图像和动态场景数据 Step1: Regressing 4D pointmaps 回归4D点图 Step2: Establishing dense correspondences 进行密集对应 Step3: Model training with temporal awareness 模型训练与时间意识 Output: Enhanced 3D reconstruction 模型改进的三维重建 |
| 9.2 | [9.2] 2504.06003 econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians [{'name': 'Can Zhang, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D semantic segmentation 3D语义分割 multi-view consistency 多视角一致性 open-vocabulary segmentation 开放词汇分割 |
Input: Multi-view images 多视角图像 Step1: Data refinement using Confidence-region Guided Regularization (CRR) 使用信心区域引导正则化进行数据细化 Step2: Constructing a low-dimensional contextual space 创建低维上下文空间 Step3: Fusing backprojected multi-view features 融合反投影的多视角特征 Output: 3D semantic field representation 3D语义场表示 |
| 8.5 | [8.5] 2504.05422 EP-Diffuser: An Efficient Diffusion Model for Traffic Scene Generation and Prediction via Polynomial Representations [{'name': 'Yue Yao, Mohamed-Khalil Bouzidi, Daniel Goehring, Joerg Reichardt'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 traffic scene generation autonomous vehicles generative models |
Input: Road layout and agent history 路布局和代理历史 Step1: Model designing using polynomial representations 模型设计,使用多项式表示 Step2: Training the diffusion-based generative model 训练基于扩散的生成模型 Step3: Evaluating traffic scene predictions 评估交通场景预测 Output: Diverse and plausible traffic scene continuations 生成多样和合理的交通场景延续 |
| 8.5 | [8.5] 2504.05579 TAPNext: Tracking Any Point (TAP) as Next Token Prediction [{'name': 'Artem Zholus, Carl Doersch, Yi Yang, Skanda Koppula, Viorica Patraucean, Xu Owen He, Ignacio Rocco, Mehdi S. M. Sajjadi, Sarath Chandar, Ross Goroshin'}] |
3D Reconstruction and Modeling 三维重建 | v2 point tracking 3D reconstruction robotics |
Input: Video frames 视频帧 Step 1: Point tracking point tracking 追踪 Step 2: Token decoding 令牌解码 Step 3: Model evaluation 模型评估 Output: Accurate point tracks 准确的点轨迹 |
| 8.5 | [8.5] 2504.05786 How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM [{'name': 'Jirong Zha, Yuxuan Fan, Xiao Yang, Chen Gao, Xinlei Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D spatial understanding Large Language Models multimodal fusion autonomous vehicles robotics |
Input: Integration of Large Language Models (LLMs) with 3D spatial understanding Step1: Categorization into image-based, point cloud-based, and hybrid modality methods Step2: Systematic review of existing research methods Step3: Discussion on limitations and future directions Output: Comprehensive framework for 3D-LLM integration |
| 8.5 | [8.5] 2504.05882 Turin3D: Evaluating Adaptation Strategies under Label Scarcity in Urban LiDAR Segmentation with Semi-Supervised Techniques [{'name': 'Luca Barco, Giacomo Blanco, Gaetano Chiriaco, Alessia Intini, Luigi La Riccia, Vittorio Scolamiero, Piero Boccardo, Paolo Garza, Fabrizio Dominici'}] |
3D Semantic Segmentation 三维语义分割 | v2 3D segementation LiDAR urban modeling |
Input: Aerial LiDAR data 空中激光雷达数据 Step1: Dataset collection 数据集收集 Step2: Performance benchmarking 性能基准测试 Step3: Semi-supervised learning application 半监督学习应用 Output: Improved 3D semantic segmentation results 改进的3D语义分割结果 |
| 8.5 | [8.5] 2504.05908 PRIMEDrive-CoT: A Precognitive Chain-of-Thought Framework for Uncertainty-Aware Object Interaction in Driving Scene Scenario [{'name': 'Sriram Mandalika, Lalitha V, Athira Nambiar'}] |
Autonomous Driving 自动驾驶 | v2 3D object detection autonomous driving uncertainty-aware modeling |
Input: LiDAR-based 3D object detection and multi-view RGB references Step1: Model Training with Bayesian Graph Neural Networks (BGNNs) Step2: Uncertainty modeling for object interactions Step3: Evaluation on DriveCoT dataset Output: Enhanced decision-making under uncertainty |
| 8.0 | [8.0] 2504.05458 Optimizing 4D Gaussians for Dynamic Scene Video from Single Landscape Images [{'name': 'In-Hwan Jin, Haesoo Choo, Seong-Hun Jeong, Heemoon Park, Junghwan Kim, Oh-joon Kwon, Kyeongbo Kong'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D space virtualization 3D空间虚拟化 dynamic scene video 动态场景视频 |
Input: Single landscape image 单个景观图像 Step1: Generate multi-view images 生成多视角图像 Step2: Optimize 3D Gaussians 优化3D高斯 Step3: Estimate consistent 3D motion 估计一致的3D运动 Output: Dynamic scene video 动态场景视频 |
| 8.0 | [8.0] 2504.05979 An Empirical Study of GPT-4o Image Generation Capabilities [{'name': 'Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, Lu Qi'}] |
Image Generation 图像生成 | v2 image generation multimodal models GPT-4o image-to-3D generation |
Input: Generative models and tasks 生成模型和任务 Step1: Evaluation against existing models 与现有模型的评估 Step2: Benchmarking across categories 在各类任务中的基准测试 Step3: Comparative analysis of strengths and limitations 优势和局限性的比较分析 Output: Comprehensive evaluation results 综合评估结果 |
| 7.5 | [7.5] 2504.05402 Time-adaptive Video Frame Interpolation based on Residual Diffusion [{'name': 'Victor Fonte Chavez, Claudia Esteves, Jean-Bernard Hayet'}] |
Image and Video Generation 图像生成与视频生成 | v2 Video Frame Interpolation Diffusion Models Animation |
Input: Animation frames 动画帧 Step1: Time handling during training 训练过程中的时间处理 Step2: Adapt diffusion scheme for VFI 适应扩散方案用于视频帧插值 Step3: Uncertainty estimation 不确定性估计 Output: Interpolated video frames 插值视频帧 |
| 6.5 | [6.5] 2504.05456 Generative Adversarial Networks with Limited Data: A Survey and Benchmarking [{'name': 'Omar De Mitri, Ruyu Wang, Marco F. Huber'}] |
Image Generation 图像生成 | v2 Generative Adversarial Networks Limited Data Image Synthesis Generative Models |
Input: Limited datasets 限量数据 Step 1: Literature review 文献综述 Step 2: Performance evaluation 性能评估 Step 3: Challenge identification 挑战识别 Output: Insights on GAN performance 生成对抗网络性能见解 |
Arxiv 2025-04-08
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.03875 3D Scene Understanding Through Local Random Access Sequence Modeling [{'name': 'Wanhee Lee, Klemen Kotar, Rahul Mysore Venkatesh, Jared Watrous, Honglin Chen, Khai Loong Aw, Daniel L. K. Yamins'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene understanding novel view synthesis depth estimation |
Input: Single images 单幅图像 Step1: Local patch quantization 局部图块量化 Step2: Randomly ordered sequence generation 随机顺序生成 Step3: 3D scene editing via optical flow 通过光流进行三维场景编辑 Output: Enhanced capabilities for 3D scene understanding 改进的三维场景理解能力 |
| 9.5 | [9.5] 2504.03886 WildGS-SLAM: Monocular Gaussian Splatting SLAM in Dynamic Environments [{'name': 'Jianhao Zheng, Zihan Zhu, Valentin Bieri, Marc Pollefeys, Songyou Peng, Iro Armeni'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 3D reconstruction 三维重建 dynamic environments 动态环境 SLAM 同时定位与地图构建 |
Input: Monocular video sequence 单目视频序列 Step1: Generate uncertainty map 生成不确定性地图 Step2: Dynamic object removal 动态物体移除 Step3: Dense bundle adjustment and Gaussian map optimization 密集束调整与高斯地图优化 Output: 3D Gaussian map and camera trajectory 3D高斯地图和相机轨迹 |
| 9.5 | [9.5] 2504.04190 Interpretable Single-View 3D Gaussian Splatting using Unsupervised Hierarchical Disentangled Representation Learning [{'name': 'Yuyang Zhang, Baao Xie, Hu Zhu, Qi Wang, Huanting Guo, Xin Jin, Wenjun Zeng'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction Gaussian Splatting interpretability disentangled representation learning single-view |
Input: Single-view images 单视角图像 Step1: Data integration 数据集成 Step2: Hierarchical disentangled representation learning (DRL) 层次化解耦表征学习 Step3: 3D geometry and appearance disentanglement 3D几何和外观解耦 Output: Interpretable and high-quality 3D models 可解释的高质量3D模型 |
| 9.5 | [9.5] 2504.04294 3R-GS: Best Practice in Optimizing Camera Poses Along with 3DGS [{'name': 'Zhisheng Huang, Peng Wang, Jingdong Zhang, Yuan Liu, Xin Li, Wenping Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting Structure-from-Motion camera pose optimization |
Input: 3D Gaussian representations and camera poses 3D高斯表示和相机姿态 Step1: Joint optimization of 3D Gaussians and camera parameters 联合优化3D高斯和相机参数 Step2: Implement 3DGS-MCMC for robustness 对3DGS-MCMC实施以增强鲁棒性 Step3: Use an MLP for camera pose refinement 使用多层感知机(MLP)进行相机姿态优化 Output: High-quality novel views and accurate camera poses 输出:高质量的新视图和准确的相机姿态 |
| 9.5 | [9.5] 2504.04448 Thermoxels: a voxel-based method to generate simulation-ready 3D thermal models [{'name': 'Etienne Chassaing, Florent Forest, Olga Fink, Malcolm Mielle'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction finite element analysis thermal modeling voxel-based modeling |
Input: Sparse RGB and thermal images RGB和热图像作为输入 Step1: Voxel representation 体素表示 Step2: Geometry and temperature optimization 几何和温度优化 Step3: Model transformation to tetrahedral meshes 将模型转换为四面体网格 Output: FEA-compatible 3D models 输出: 兼容FEA的3D模型 |
| 9.5 | [9.5] 2504.04454 PRISM: Probabilistic Representation for Integrated Shape Modeling and Generation [{'name': 'Lei Cheng, Mahdi Saleh, Qing Cheng, Lu Sang, Hongli Xu, Daniel Cremers, Federico Tombari'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D shape generation Statistical Shape Models (SSM) Gaussian Mixture Models (GMM) |
Input: Real-world objects 真实物体 Step1: Integration of Statistical Shape Models (SSM) and Gaussian Mixture Models (GMM) 整合统计形状模型和高斯混合模型 Step2: Application of categorical diffusion models 应用类别扩散模型 Step3: Shape generation and manipulation shapes 生成和操作形状 Output: High-fidelity, structurally coherent 3D shapes 高保真、结构一致的三维形状 |
| 9.5 | [9.5] 2504.04597 Targetless LiDAR-Camera Calibration with Anchored 3D Gaussians [{'name': 'Haebeom Jung, Namtae Kim, Jungwoo Kim, Jaesik Park'}] |
3D Reconstruction and Modeling 三维重建 | v2 LiDAR-camera calibration 3D Gaussian autonomous driving |
Input: LiDAR and camera data 激光雷达与相机数据 Step1: Freeze reliable LiDAR points as anchors 固定可靠的激光雷达点作为锚点 Step2: Jointly optimize sensor poses and Gaussian parameters 联合优化传感器姿态和高斯参数 Step3: Evaluate using photometric loss 和光度损失进行评估 Output: Improved calibration poses 改进的标定姿态 |
| 9.5 | [9.5] 2504.04679 DeclutterNeRF: Generative-Free 3D Scene Recovery for Occlusion Removal [{'name': 'Wanzhou Liu, Zhexiao Xiong, Xinyu Li, Nathan Jacobs'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction occlusion removal Neural Radiance Fields |
Input: Incomplete images 不完整图像 Step1: Joint multi-view optimization of learnable camera parameters 学习相机参数的多视角联合优化 Step2: Application of occlusion annealing regularization 应用遮挡退火正则化 Step3: Use of stochastic structural similarity loss 使用随机结构相似性损失 Output: High-quality 3D scene reconstructions 高质量的三维场景重建 |
| 9.5 | [9.5] 2504.04732 Inverse++: Vision-Centric 3D Semantic Occupancy Prediction Assisted with 3D Object Detection [{'name': 'Zhenxing Ming, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D semantic occupancy prediction 3D语义占用预测 autonomous vehicles 自动驾驶 3D object detection 3D物体检测 |
Input: Surround-view images 360度视图图像 Step1: Introduce 3D object detection auxiliary branch 引入3D物体检测辅助分支 Step2: Enhance intermediate feature supervision 增强中间特征监督 Step3: Generate 3D semantic occupancy grid 生成3D语义占用网格 Output: Improved 3D perception capabilities 改进的3D感知能力 |
| 9.5 | [9.5] 2504.05170 SSLFusion: Scale & Space Aligned Latent Fusion Model for Multimodal 3D Object Detection [{'name': 'Bonan Ding, Jin Xie, Jing Nie, Jiale Cao'}] |
3D Object Detection 三维物体检测 | v2 3D object detection feature fusion autonomous systems |
Input: Multi-modal data (LiDAR and camera images) 输入: 多模态数据(激光雷达和摄像机图像) Step1: Feature extraction 特征提取 Step2: Scale-aligned feature fusion 按尺度对齐的特征融合 Step3: 3D-to-2D space alignment 3D到2D空间对齐 Step4: Cross-modal latent fusion 跨模态潜变量融合 Output: Accurate 3D object detection results 输出: 精确的3D物体检测结果 |
| 9.5 | [9.5] 2504.05249 Texture2LoD3: Enabling LoD3 Building Reconstruction With Panoramic Images [{'name': 'Wenzhao Tang, Weihang Li, Xiucheng Liang, Olaf Wysocki, Filip Biljecki, Christoph Holst, Boris Jutzi'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D building reconstruction LoD3 panoramic images semantic segmentation |
Input: Panoramic street-level images 全景街景图像 Step1: Image-to-object matching 图像与对象匹配 Step2: 3D model B-Rep surface simplification 3D模型边界表示表面简化 Step3: Ortho-rectification of images 图像正射校正 Step4: Facade segmentation facade segmentation Output: Enhanced Level of Detail 3D building models 改进的细节层次(LoD) 3D建筑模型 |
| 9.2 | [9.2] 2504.03868 Control Map Distribution using Map Query Bank for Online Map Generation [{'name': 'Ziming Liu, Leichen Wang, Ge Yang, Xinrun Li, Xingtao Hu, Hao Sun, Guangyu Gao'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 High-definition maps 高清地图 Online map generation 在线地图生成 Autonomous driving 自动驾驶 Transformers 变换器 |
Input: Low-cost standard definition map data (SD map) 标准定义地图数据 Step1: Map query bank decomposition 地图查询银行分解 Step2: Initial distribution generation for scenarios 场景的初始分布生成 Step3: Map predictions optimization 地图预测优化 Output: Optimized HD maps 优化的高清地图 |
| 9.2 | [9.2] 2504.05303 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models [{'name': "Sai Kumar Dwivedi, Dimitrije Anti\'c, Shashank Tripathi, Omid Taheri, Cordelia Schmid, Michael J. Black, Dimitrios Tzionas"}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction human-object interaction Vision-Language Models |
Input: In-the-wild images 在野外图像 Step1: Multi-view rendering 多视角渲染 Step2: 2D contact mask prediction 2D接触掩膜预测 Step3: 3D lifting of contact points 3D接触点提升 Output: 3D contact points 3D接触点 |
| 9.0 | [9.0] 2504.04457 VSLAM-LAB: A Comprehensive Framework for Visual SLAM Methods and Datasets [{'name': 'Alejandro Fontan, Tobias Fischer, Javier Civera, Michael Milford'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 Visual SLAM benchmarking robotics |
Input: VSLAM algorithms and datasets Step1: Standardization of datasets and evaluation metrics Step2: Automation of dataset downloading and preprocessing Step3: Streamlined configuration and execution of experiments Output: Efficient benchmarking of VSLAM systems |
| 9.0 | [9.0] 2504.04753 CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images [{'name': 'Cheng Chen, Jiacheng Wei, Tianrun Chen, Chi Zhang, Xiaofeng Yang, Shangzhan Zhang, Bingchen Yang, Chuan-Sheng Foo, Guosheng Lin, Qixing Huang, Fayao Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction CAD models image generation multi-view images geometric features |
Input: Unconstrained real-world CAD images 非约束的现实世界CAD图像 Step1: Geometry encoding 几何编码 Step2: Latent diffusion modeling 潜在扩散建模 Step3: Code checking for validity 代码有效性检查 Output: Generated parametric CAD models 生成的参数化CAD模型 |
| 8.5 | [8.5] 2504.04124 EMF: Event Meta Formers for Event-based Real-time Traffic Object Detection [{'name': 'Muhammad Ahmed Ullah Khan, Abdul Hannan Khan, Andreas Dengel'}] |
Autonomous Driving 自动驾驶 | v2 event-based detection autonomous driving object detection |
Input: Event camera data 事件相机数据 Step1: Develop Event Progression Extractor module 开发事件进展提取模块 Step2: Implement Metaformer architecture 实现Metaformer架构 Step3: Evaluate on traffic object detection benchmarks 在交通物体检测基准上进行评估 Output: Efficient traffic object detection model 高效的交通物体检测模型 |
| 8.5 | [8.5] 2504.04158 JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration [{'name': 'Yunlong Lin, Zixu Lin, Haoyu Chen, Panwang Pan, Chenxin Li, Sixiang Chen, Yeying Jin, Wenbo Li, Xinghao Ding'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 autonomous driving image restoration perception systems vision-language models |
Input: Real-world degraded images 真实世界退化图像 Step1: Model integration 模型集成 Step2: Two-stage framework development 二阶段框架开发 Step3: Evaluation on CleanBench dataset 在CleanBench数据集上评估 Output: Enhanced perception metrics 改进的感知指标 |
| 8.5 | [8.5] 2504.04348 OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counterfactual Reasoning [{'name': 'Shihao Wang, Zhiding Yu, Xiaohui Jiang, Shiyi Lan, Min Shi, Nadine Chang, Jan Kautz, Ying Li, Jose M. Alvarez'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 3D driving vision-language models |
Input: 3D driving tasks and vision-language dataset 3D驾驶任务和视觉语言数据集 Step 1: Data generation using counterfactual reasoning 基于反事实推理的数据生成 Step 2: Framework evaluation with Omni-L and Omni-Q Omni-L与Omni-Q的框架评估 Output: Improved decision-making models 改进的决策模型 |
| 8.5 | [8.5] 2504.04540 The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models? [{'name': 'Weichen Zhang, Ruiying Peng, Chen Gao, Jianjie Fang, Xin Zeng, Kaiyuan Li, Ziyou Wang, Jinqiang Cui, Xin Wang, Xinlei Chen, Yong Li'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D spatial reasoning point clouds Large Language Models |
Input: Point clouds, visual and text inputs 3D点云、视觉和文本输入 Step1: Evaluating spatial reasoning能力评估空间推理 Step2: Developing a benchmark benchmark的开发 Step3: Analyzing model performance模型性能分析 Output: Insights into 3D LLMs对3D LLM的洞察 |
| 8.5 | [8.5] 2504.04631 Systematic Literature Review on Vehicular Collaborative Perception -- A Computer Vision Perspective [{'name': 'Lei Wan, Jianxin Zhao, Andreas Wiedholz, Manuel Bied, Mateus Martinez de Lucena, Abhishek Dinkar Jagtap, Andreas Festag, Ant\^onio Augusto Fr\"ohlich, Hannan Ejaz Keen, Alexey Vinel'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Collaborative Perception Autonomous Vehicles Computer Vision |
Input: 106 peer-reviewed articles 106篇同行评审的文章 Step1: Literature selection 文献选择 Step2: Comparative analysis 比较分析 Step3: Identify research gaps 确定研究空白 Output: Systematic insights on CP 系统的CP洞察 |
| 8.5 | [8.5] 2504.04701 DFormerv2: Geometry Self-Attention for RGBD Semantic Segmentation [{'name': 'Bo-Wen Yin, Jiao-Long Cao, Ming-Ming Cheng, Qibin Hou'}] |
3D Reconstruction and Modeling 三维重建 | v2 RGBD segmentation geometry prior self-attention |
Input: RGB and depth images RGB与深度图像 Step1: Feature extraction 特征提取 Step2: Geometry self-attention mechanism 几何自注意力机制 Step3: Model evaluation 模型评估 Output: Semantic segmentation results 语义分割结果 |
| 8.5 | [8.5] 2504.04744 Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions [{'name': 'He Zhu, Quyu Kong, Kechun Xu, Xunlong Xia, Bing Deng, Jieping Ye, Rong Xiong, Yue Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D object affordance vision-language model robotics |
Input: Language instructions, visual observations, and interactions 语言指令、视觉观测和交互 Step1: Dataset collection 数据集收集 Step2: Multi-modal feature fusion 多模态特征融合 Step3: Model implementation and evaluation 模型实施与评估 Output: Grounded 3D object affordance 具备位置的3D对象效用 |
| 8.5 | [8.5] 2504.04781 OCC-MLLM-CoT-Alpha: Towards Multi-stage Occlusion Recognition Based on Large Language Models via 3D-Aware Supervision and Chain-of-Thoughts Guidance [{'name': 'Chaoyi Wang, Baoqing Li, Xinhan Di'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 3D-aware supervision occlusion recognition multi-modal large language models |
Input: Multi-modal vision-language model and 3D expert reconstruction model 多模态视觉语言模型和3D专家重建模型 Step1: Pre-train the vision-language model 预训练视觉语言模型 Step2: Train the 3D expert reconstruction model 训练3D专家重建模型 Step3: Implement Chain-of-Thoughts learning 实施思维链学习 Output: Enhanced recognition of occluded objects 改进的遮挡物体识别 |
| 8.5 | [8.5] 2504.04837 Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos [{'name': 'Zhi Zuo, Chenyi Zhuang, Zhiqiang Shen, Pan Gao, Jie Qin'}] |
Point Cloud Processing 点云处理 | v2 point cloud videos self-supervised learning 4D representation |
Input: Point cloud videos 点云视频 Step1: Model motion representation in latent space 在潜在空间中建模运动表示 Step2: Introduce latent and geometry tokens 引入潜在和几何标记 Step3: Train self-disentangled MAE 训练自解耦MAE Output: Discriminative 4D representations 差别化的4D表示 |
| 8.5 | [8.5] 2504.04841 Prior2Former -- Evidential Modeling of Mask Transformers for Assumption-Free Open-World Panoptic Segmentation [{'name': 'Sebastian Schmidt, Julius K\"orner, Dominik Fuchsgruber, Stefano Gasperini, Federico Tombari, Stephan G\"unnemann'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Panoptic segmentation 泛光分割 Anomaly detection 异常检测 Evidential learning 证据学习 Autonomous driving 自动驾驶 |
Input: Pixel-wise binary mask assignments 像素级二进制掩模分配 Step1: Incorporate Beta prior 引入Beta先验 Step2: Compute model uncertainty 计算模型不确定性 Step3: Perform anomaly and panoptic segmentation 执行异常和全景分割 Output: State-of-the-art segmentation results 最先进的分割结果 |
| 8.5 | [8.5] 2504.05075 PvNeXt: Rethinking Network Design and Temporal Motion for Point Cloud Video Recognition [{'name': 'Jie Wang, Tingfa Xu, Lihe Ding, Xinjie Zhang, Long Bai, Jianan Li'}] |
Point Cloud Processing 点云处理 | v2 point cloud recognition 4D representation learning |
Input: Point cloud video sequences 点云视频序列 Step1: Motion capture through Motion Imitator 捕获运动通过运动模仿器 Step2: One-step query operation from Single-Step Motion Encoder 单步查询操作通过单步运动编码器 Output: Efficient point cloud video recognition 高效的点云视频识别 |
| 8.5 | [8.5] 2504.05148 Stereo-LiDAR Fusion by Semi-Global Matching With Discrete Disparity-Matching Cost and Semidensification [{'name': 'Yasuhiro Yao, Ryoichi Ishikawa, Takeshi Oishi'}] |
Depth Estimation 深度估计 | v2 Depth Estimation 深度估计 Sensor Fusion 传感器融合 Autonomous Systems 自主系统 |
Input: Stereo camera images and LiDAR data 立体相机图像和LiDAR数据 Step1: Apply Semi-Global Matching (SGM) to estimate disparity 使用Semi-Global Matching (SGM)估计视差 Step2: Implement Discrete Disparity-matching Cost (DDC) for disparity evaluation 实现离散视差匹配成本 (DDC) 用于视差评估 Step3: Perform semidensification to enhance disparity maps 进行半密集化以增强视差图 Step4: Execute stereo-LiDAR consistency check for validation 执行立体-激光雷达一致性检查以进行验证 Output: Accurate depth maps with improved performance 输出:准确的深度图并提高性能 |
| 8.5 | [8.5] 2504.05152 PanoDreamer: Consistent Text to 360-Degree Scene Generation [{'name': 'Zhexiao Xiong, Zhang Chen, Zhong Li, Yi Xu, Nathan Jacobs'}] |
3D Generation 三维生成 | v2 3D generation text to 3D geometric consistency |
Input: Text description and/or reference image 文本描述和/或参考图像 Step1: Generate initial panoramic scene 生成初始全景场景 Step2: Lift panorama into 3D 提升全景至三维 Step3: Generate images from different viewpoints 根据不同视点生成图像 Step4: Compose images into a global point cloud 将图像合成全局点云 Step5: Use 3D Gaussian Splatting for final scene rendering 使用3D高斯点云进行最终场景渲染 |
| 8.5 | [8.5] 2504.05201 3D Universal Lesion Detection and Tagging in CT with Self-Training [{'name': 'Jared Frazier, Tejas Sudharshan Mathai, Jianfei Liu, Angshuman Paul, Ronald M. Summers'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D lesion detection self-training computed tomography |
Input: CT images 计算机断层扫描图像 Step1: Train VFNet model for 2D detection 训练VFNet模型进行二维检测 Step2: Expand 2D detection to 3D 将二维检测扩展到三维 Step3: Self-training with 3D proposals 进行自我训练以使用3D预测 Output: Tagged 3D lesions 标记的三维病变 |
| 7.5 | [7.5] 2504.04099 TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection [{'name': 'Chunzhao Xie, Tongxuan Liu, Lei Jiang, Yuting Zeng, jinrong Guo, Yunheng Shen, Weizhe Huang, Jing Li, Xiaohua Xu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models hallucination mitigation Temporal Attention |
Input: Large Vision-Language Models (LVLMs) Step 1: Investigate attention decay correlation with hallucinations Step 2: Propose Temporal Attention Real-time Accumulative Connection (TARAC) Step 3: Integrate TARAC into existing LVLM architectures Output: Enhanced attention mechanisms mitigating hallucinations |
| 7.5 | [7.5] 2504.04676 Dual Consistent Constraint via Disentangled Consistency and Complementarity for Multi-view Clustering [{'name': 'Bo Li, Jing Yun'}] |
Multi-view and Stereo Vision 多视角与立体视觉 | v2 Multi-view clustering 多视角聚类 Consistency 一致性 Complementarity 互补性 |
Input: Multi-view data 多视角数据 Step1: Separate shared and private information 分离共享和私有信息 Step2: Learn consistencies通过对比学习最大化互信息 Step3: Apply dual consistency constraints 使用双一致性约束 Output: Improved clustering performance 改进的聚类性能 |
| 7.5 | [7.5] 2504.04911 IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR [{'name': 'Ziyun Liang, Xiaoqing Guo, Wentian Xu, Yasin Ibrahim, Natalie Voets, Pieter M Pretorius, J. Alison Noble, Konstantinos Kamnitsas'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction anomaly detection MRI segmentation |
Input: 3D Brain MRI scans 3D 脑部 MRI 扫描 Step1: Spatial masking of images 空间掩蔽图像 Step2: Iterative mask refinement 迭代掩蔽精化 Step3: Anomaly reconstruction 异常重建 Output: Segmented anomalies 细分异常 |
| 7.0 | [7.0] 2504.04740 Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data [{'name': 'Samarth Mishra, Kate Saenko, Venkatesh Saligrama'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 compositional reasoning vision-language models multimodal learning |
Input: Multimodal large language models (MLLMs) 多模态大型语言模型 Step1: Data augmentation data generation 数据增强数据生成 Step2: Preference tuning on synthetic data 通过合成数据进行偏好调整 Step3: Model evaluation on compositional benchmarks 模型在合成基准上的评估 Output: Improved compositional reasoning capabilities 改进的组合推理能力 |
Arxiv 2025-04-07
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.03052 Cooperative Inference for Real-Time 3D Human Pose Estimation in Multi-Device Edge Networks [{'name': 'Hyun-Ho Choi, Kangsoo Kim, Ki-Ho Lee, Kisong Lee'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D pose estimation cooperative inference mobile edge computing real-time processing |
Input: Images captured by multiple cameras 多摄像头捕获的图像 Step1: 2D pose estimation from images 从图像中估计二维姿态 Step2: Offloading filtered images to edge server 将筛选后的图像转发到边缘服务器 Step3: 3D joint coordinate calculation on edge server 在边缘服务器上计算三维关节坐标 Output: Real-time 3D pose estimation 实时三维姿态估计 |
| 9.5 | [9.5] 2504.03059 Compressing 3D Gaussian Splatting by Noise-Substituted Vector Quantization [{'name': 'Haishan Wang, Mohammad Hassan Vali, Arno Solin'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting memory compression 3D reconstruction |
Input: 3D Gaussian Splatting models 3D高斯点云模型 Step1: Build attribute codebooks 构建属性码本 Step2: Apply noise-substituted vector quantization 应用噪声替代的向量量化 Step3: Optimize memory usage 优化内存使用 Output: Compressed 3D representations 压缩的三维表示 |
| 9.5 | [9.5] 2504.03164 NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving [{'name': 'Kexin Tian, Jingrui Mao, Yunlong Zhang, Jiwan Jiang, Yang Zhou, Zhengzhong Tu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models autonomous driving spatial reasoning 3D scene graph |
Input: NuScenes dataset with multi-modal sensor data 启用: NuScenes 数据集与多模态传感器数据 Step1: 3D scene graph generation pipeline 3D场景图生成管道 Step2: QA generation pipeline 问答生成管道 Step3: Evaluation of VLMs on spatial understanding and reasoning VLM在空间理解和推理上的评估 Output: Benchmark for VLMs in autonomous driving autonomous driving中的VLM基准 |
| 9.5 | [9.5] 2504.03177 Detection Based Part-level Articulated Object Reconstruction from Single RGBD Image [{'name': 'Yuki Kawana, Tatsuya Harada'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction articulated objects RGBD images |
Input: Single RGBD image 单个RGBD图像 Step1: Part detection 部件检测 Step2: Kinematics-aware part fusion 运动学感知部件融合 Step3: Anisotropic scale normalization 各向异性尺度归一化 Step4: Cross-refinement for output space cross-refinement 在输出空间进行交叉细化 Output: Reconstructed articulated shapes 重建的关节形状 |
| 9.5 | [9.5] 2504.03198 Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video [{'name': 'Jiaxin Guo, Wenzhen Dong, Tianyu Huang, Hao Ding, Ziyi Wang, Haomin Kuang, Qi Dou, Yun-Hui Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction monocular video surgical robotics |
Input: Monocular surgical videos 单目外科视频 Step1: Data integration 数据集成 Step2: Algorithm implementation 算法实现 Step3: Uncertainty measurement 不确定性测量 Step4: Pointmap and depth prediction 点图和深度预测 Output: 3D models and camera parameters 3D模型和相机参数 |
| 9.5 | [9.5] 2504.03258 TQD-Track: Temporal Query Denoising for 3D Multi-Object Tracking [{'name': 'Shuxiao Ding, Yutong Yang, Julian Wiederer, Markus Braun, Peizheng Li, Juergen Gall, Bin Yang'}] |
3D Multi-Object Tracking 3D多目标跟踪 | v2 3D tracking query denoising autonomous driving |
Input: Ground truth detections from previous frame Step1: Generate denoising queries with noise Step2: Propagate denoising queries to current frame Step3: Predict corresponding ground truths Output: Enhanced tracking results |
| 9.5 | [9.5] 2504.03438 ZFusion: An Effective Fuser of Camera and 4D Radar for 3D Object Perception in Autonomous Driving [{'name': 'Sheng Yang, Tong Zhan, Shichen Qiao, Jicheng Gong, Qing Yang, Yanfeng Lu, Jian Wang'}] |
3D Object Detection 3D物体检测 | v2 3D object perception autonomous driving 4D radar |
Input: 4D radar and camera data 4D 雷达和相机数据 Step1: Fusion of sensor data 传感器数据融合 Step2: Feature extraction 特征提取 Step3: 3D object detection algorithm 3D物体检测算法 Output: Improved object perception 精确的物体感知 |
| 9.5 | [9.5] 2504.03563 PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector [{'name': 'Kaidong Li, Tianxiao Zhang, Kuan-Chuan Peng, Guanghui Wang'}] |
3D Object Detection and LiDAR融合 3D对象检测 | v2 3D detection 3D检测 LiDAR autonomous driving 自动驾驶 |
Input: Camera images and LiDAR point clouds 摄像机图像和激光雷达点云 Step1: Data preprocessing 数据预处理 Step2: Feature extraction 特征提取 using foundation model encoders Step3: Soft prompt integration 软提示集成 for feature fusion Step4: 3D detection model training 3D检测模型训练 Output: Enhanced 3D object detection results 改进的3D物体检测结果 |
| 9.5 | [9.5] 2504.03602 Robust Human Registration with Body Part Segmentation on Noisy Point Clouds [{'name': 'Kai Lascheit, Daniel Barath, Marc Pollefeys, Leonidas Guibas, Francis Engelmann'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human meshes body-part segmentation pose estimation noisy point clouds mesh fitting |
Input: Noisy point clouds 噪声点云 Step1: Body part segmentation body-part segmentation Step2: SMPL-X fitting SMPL-X拟合 Step3: Pose and orientation initialization 姿态和方向初始化 Step4: Refinement of point cloud alignment 点云对齐细化 Output: Accurate human mesh 人体网格 |
| 9.0 | [9.0] 2504.03536 HumanDreamer-X: Photorealistic Single-image Human Avatars Reconstruction via Gaussian Restoration [{'name': 'Boyuan Wang, Runqi Ouyang, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Chaojun Ni, Guan Huang, Lihong Liu, Xingang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction human avatars autonomous driving |
Input: Single human image 单幅人像图像 Step1: 3D Gaussian Splatting for initial geometry 初步几何构建 Step2: Multi-view generation through integration 多视角图像生成 Step3: HumanFixer for restoration and refinement 修复与改进流程 Output: High-quality, animatable human avatars 输出:高质量可动画人形模型 |
| 8.5 | [8.5] 2504.02884 Enhancing Traffic Sign Recognition On The Performance Based On Yolov8 [{'name': 'Baba Ibrahim (Hubei University of Automotive Technology,Hubei University of Automotive Technology), Zhou Kui (Hubei University of Automotive Technology,Hubei University of Automotive Technology)'}] |
Autonomous Driving 自动驾驶 | v2 Traffic Sign Recognition Yolov8 Autonomous Driving |
Input: Traffic sign images 交通标志图像 Step1: Data augmentation 数据增强 Step2: Model training using YOLOv8 使用YOLOv8模型训练 Step3: Model evaluation on various datasets 在不同数据集上评估模型 Output: Enhanced detection models 改进的检测模型 |
| 8.5 | [8.5] 2504.02920 LiDAR-based Object Detection with Real-time Voice Specifications [{'name': 'Anurag Kulkarni'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 LiDAR object detection autonomous driving real-time voice synthesis |
Input: LiDAR and RGB data LiDAR和RGB数据 Step1: Data integration 数据集成 Step2: Object detection algorithm development 物体检测算法开发 Step3: Real-time voice synthesis implementation 实时语音合成实现 Output: Real-time feedback and 3D visualizations 实时反馈和3D可视化 |
| 8.5 | [8.5] 2504.03047 Attention-Aware Multi-View Pedestrian Tracking [{'name': 'Reef Alturki, Adrian Hilton, Jean-Yves Guillemaut'}] |
Multi-view Stereo 多视角立体 | v2 multi-view tracking attention mechanisms pedestrian detection |
Input: Multi-view images 多视角图像 Step 1: Early-fusion for detection 早期融合进行检测 Step 2: Cross-attention mechanism for association 使用交叉注意机制进行关联 Step 3: Robust feature propagation 可靠特征传播 Output: Enhanced pedestrian tracking performance 改进的人行道跟踪性能 |
| 8.5 | [8.5] 2504.03089 SLACK: Attacking LiDAR-based SLAM with Adversarial Point Injections [{'name': 'Prashant Kumar, Dheeraj Vattikonda, Kshitij Madhav Bhat, Kunal Dargan, Prem Kalra'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 LiDAR-based SLAM autonomous driving adversarial attacks point injections |
Input: LiDAR scans from autonomous vehicles Step1: Develop a novel autoencoder with segmentation-based attention Step2: Integrate contrastive learning for precise LiDAR reconstructions Step3: Implement point injections to test adversarial attacks Output: Efficacy of point injections on SLAM navigation |
| 8.5 | [8.5] 2504.03171 Real-Time Roadway Obstacle Detection for Electric Scooters Using Deep Learning and Multi-Sensor Fusion [{'name': 'Zeyang Zheng, Arman Hosseini, Dong Chen, Omid Shoghli, Arsalan Heydarian'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人技术 | v2 obstacle detection e-scooter deep learning sensor fusion |
Input: RGB camera and depth camera RGB相机和深度相机 Step1: Sensor integration 传感器集成 Step2: Obstacle detection using YOLO 使用YOLO进行障碍物检测 Step3: Depth data analysis 深度数据分析 Output: Real-time obstacle detection results 实时障碍物检测结果 |
| 8.5 | [8.5] 2504.03193 Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation [{'name': 'Xin Zhang, Robby T. Tan'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Domain Generalized Semantic Segmentation Vision Foundation Models Vision-Language Models autonomous driving computational efficiency |
Input: Domain data and models 领域数据与模型 Step1: Feature extraction 特征提取 Step2: Model adaptation 模型适应 Step3: Domain generalization evaluation 域泛化评估 Output: Enhanced segmentation performance 改进的分割性能 |
| 8.5 | [8.5] 2504.03306 Multi-Flow: Multi-View-Enriched Normalizing Flows for Industrial Anomaly Detection [{'name': 'Mathis Kruse, Bodo Rosenhahn'}] |
Multi-view Stereo 多视角立体 | v2 Multi-view anomaly detection 多视角异常检测 Normalizing flows 正规化流 Industrial applications 工业应用 |
Input: Multi-view images 多视角图像 Step1: Data fusion 融合数据 Step2: Cross-view message passing 跨视图信息传递 Step3: Anomaly detection 进行异常检测 Output: Detected anomalies 检测到的异常 |
| 8.5 | [8.5] 2504.03468 D-Garment: Physics-Conditioned Latent Diffusion for Dynamic Garment Deformations [{'name': 'Antoine Dumoulin, Adnane Boukhayma, Laurence Boissieux, Bharath Bhushan Damodaran, Pierre Hellier, Stefanie Wuhrer'}] |
3D Generation 三维生成 | v2 3D Garment Deformation Latent Diffusion Model Dynamic Modeling Vision Sensors |
Input: 3D garment template 3D服装模板 Step1: Condition on body shape and motion 以身体形状和运动为条件 Step2: Use latent diffusion model 使用潜在扩散模型 Step3: Optimize to fit observations 最优化以适应观测 Output: Dynamically deformed garment output 动态变形服装输出 |
| 8.5 | [8.5] 2504.03637 An Algebraic Geometry Approach to Viewing Graph Solvability [{'name': "Federica Arrigoni, Kathl\'en Kohn, Andrea Fusiello, Tomas Pajdla"}] |
Multi-view Geometry 多视图几何 | v2 Viewing Graph Structure-from-Motion Algebraic Geometry |
Input: Viewing graph associated with cameras 视图图与相机关联 Step1: Develop novel algebraic framework for solvability problems 提出新的代数框架用于求解问题 Step2: Analyze conditions for camera determinability 分析相机可确定性的条件 Step3: Implement computational methods for graph partitioning and solvability testing 实现图划分和求解测试的计算方法 Output: Improved understanding of structure-from-motion graphs and their solvability 改进对运动结构图及其可解性的理解 |
| 8.0 | [8.0] 2504.03249 Robot Localization Using a Learned Keypoint Detector and Descriptor with a Floor Camera and a Feature Rich Industrial Floor [{'name': 'Piet Br\"ommel, Dominik Br\"amer, Oliver Urbann, Diana Kleingarn'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 robot localization feature extraction |
Input: Images of industrial floor 工业地面的图像 Step1: Keypoint extraction 关键点提取 Step2: Deep learning for features 深度学习获取特征 Step3: Position estimation 位置估计 Output: Accurate robot localization 准确的机器人定位 |
| 7.5 | [7.5] 2504.02876 Multimodal Reference Visual Grounding [{'name': 'Yangxiao Lu, Ruosen Li, Liqiang Jing, Jikai Wang, Xinya Du, Yunhui Guo, Nicholas Ruozzi, Yu Xiang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 multimodal reference visual grounding large vision-language models few-shot object detection |
Input: Query image and reference images 输入: 查询图像和参考图像 Step1: Dataset creation for MRVG 创建MRVG数据集 Step2: Novel method for visual grounding using LLMs 开发基于LLMs的视觉定位新方法 Step3: Evaluation of the model's visual grounding performance 模型可视化定位性能的评估 Output: Bounding boxes or segmentation masks 输出: 目标对象的边界框或分割掩码 |
| 7.5 | [7.5] 2504.03140 Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models [{'name': 'Xuran Ma, Yexin Liu, Yaofu Liu, Xianfeng Wu, Mingzhe Zheng, Zihao Wang, Ser-Nam Lim, Harry Yang'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation diffusion models caching strategy |
Input: Video sequences 视频序列 Step1: Analyze attention distributions 分析注意力分布 Step2: Develop adaptive caching strategy 开发自适应缓存策略 Step3: Validate through experiments 实验验证 Output: Efficient video generation 高效视频生成 |
| 7.5 | [7.5] 2504.03154 TokenFLEX: Unified VLM Training for Flexible Visual Tokens Inference [{'name': 'Junshan Hu, Jialiang Mao, Zhikang Liu, Zhongpu Xia, Peng Jia, Xianpeng Lang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Token adaptation |
Input: Images as input images 作为输入图像 Step1: Stochastic training of vision tokens 随机训练视觉令牌 Step2: Dynamic adjustment of token counts 动态调整令牌数量 Step3: Experiments on vision-language benchmarks 在视觉-语言基准上的实验 Output: Performance evaluation and comparison with fixed-token models 输出:与固定令牌模型的性能评估和比较 |
| 7.5 | [7.5] 2504.03440 Know What You do Not Know: Verbalized Uncertainty Estimation Robustness on Corrupted Images in Vision-Language Models [{'name': 'Mirko Borszukovszki, Ivo Pascal de Jong, Matias Valdenegro-Toro'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Visual Language Models Uncertainty Estimation Corrupted Images Large Language Models |
Input: Corrupted image data 受损图像数据 Step1: Model testing 模型测试 Step2: Uncertainty estimation 不确定性估计 Step3: Results analysis 结果分析 Output: Confidence scores 置信度分数 |
| 6.0 | [6.0] 2504.03490 BUFF: Bayesian Uncertainty Guided Diffusion Probabilistic Model for Single Image Super-Resolution [{'name': 'Zihao He, Shengchuan Zhang, Runze Hu, Yunhang Shen, Yan Zhang'}] |
Image Generation 图像生成 | v2 super-resolution diffusion models |
Input: Low-resolution images (LR) 低分辨率图像 Step1: Bayesian model generates uncertainty masks 贝叶斯模型生成不确定性掩码 Step2: Modulation of noise during diffusion process 在扩散过程中对噪声进行调制 Step3: Training with enhanced focus on high-uncertainty areas 在高不确定性区域进行增强关注的训练 Output: Super-resolved images 高分辨率图像 |
| 5.0 | [5.0] 2504.03254 SARLANG-1M: A Benchmark for Vision-Language Modeling in SAR Image Understanding [{'name': 'Yimin Wei, Aoran Xiao, Yexian Ren, Yuting Zhu, Hongruixuan Chen, Junshi Xia, Naoto Yokoya'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Synthetic Aperture Radar (SAR) Vision-Language Models (VLMs) Image Captioning Visual Question Answering (VQA) |
Input: SAR images and corresponding text annotations SAR 图像与对应文本注释 Step1: Dataset creation 数据集创建 Step2: Model training and evaluation 模型训练与评估 Output: Enhanced understanding of SAR images 改进的SAR图像理解 |
Arxiv 2025-04-04
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.02261 WonderTurbo: Generating Interactive 3D World in 0.72 Seconds [{'name': 'Chaojun Ni, Xiaofeng Wang, Zheng Zhu, Weijie Wang, Haoyun Li, Guosheng Zhao, Jie Li, Wenkang Qin, Guan Huang, Wenjun Mei'}] |
3D Generation 三维生成 | v2 3D generation real-time rendering interactive 3D |
Input: User-provided single image 用户提供的单张图像 Step1: Implement StepSplat for geometric updates 实现StepSplat进行几何更新 Step2: Use QuickDepth for depth consistency 使用QuickDepth确保深度一致性 Step3: Apply FastPaint for appearance inpainting 应用FastPaint进行外观修复 Output: Interactive 3D scenes with high-quality output 输出: 高质量的交互式3D场景 |
| 9.5 | [9.5] 2504.02270 MinkOcc: Towards real-time label-efficient semantic occupancy prediction [{'name': 'Samuel Sze, Daniele De Martini, Lars Kunze'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D semantic occupancy prediction autonomous driving |
Input: Multi-view images and LiDAR data 多视角图像和激光雷达数据 Step1: Warm-start with small dataset of 3D annotations 用小型3D注释数据集进行热启动 Step2: Continued training with LiDAR sweeps and images 使用激光雷达扫描和图像进行后续训练 Step3: Real-time inference through sparse convolution networks 通过稀疏卷积网络实现实时推断 Output: 3D semantic occupancy prediction 3D语义占用预测 |
| 9.5 | [9.5] 2504.02316 ConsDreamer: Advancing Multi-View Consistency for Zero-Shot Text-to-3D Generation [{'name': 'Yuan Zhou, Shilong Jin, Litao Hua, Wanjun Lv, Haoran Duan, Jungong Han'}] |
3D Generation 三维生成 | text-to-3D generation multi-view consistency view biases visual quality geometry consistency |
Input: Text descriptions 文本描述 Step1: View Disentanglement Module (VDM) 视图解耦模块 Step2: Similarity-based partial order loss 相似性基础的部分顺序损失 Output: Geometrically consistent 3D generation 几何一致的3D生成 |
| 9.5 | [9.5] 2504.02337 LPA3D: 3D Room-Level Scene Generation from In-the-Wild Images [{'name': 'Ming-Jia Yang, Yu-Xiao Guo, Yang Liu, Bin Zhou, Xin Tong'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D room-level scene generation NeRF GAN |
Input: In-the-wild images 从野外图像输入 Step1: Define local-pose-alignment (LPA) framework 定义局部姿态对齐框架 Step2: Implement LPA-GAN for scene generation 实现LPA-GAN进行场景生成 Step3: Co-optimize pose predictor and scene generation co-optimizing姿态预测器和场景生成 Output: Generated 3D indoor scenes 生成的3D室内场景 |
| 9.5 | [9.5] 2504.02356 All-day Depth Completion via Thermal-LiDAR Fusion [{'name': 'Janghyun Kim, Minseong Kweon, Jinsun Park, Ukcheol Shin'}] |
Depth Estimation 深度估计 | v2 Depth Completion 深度补全 Thermal-LiDAR Fusion 热激光雷达融合 Autonomous Driving 自动驾驶 |
Input: Sparse LiDAR and RGB images 垂直激光雷达和RGB图像 Step1: Benchmark existing algorithms 基准现有算法 Step2: Propose COntrastive learning and Pseudo-Supervision framework 提出C对比学习和伪监督框架 Step3: Enhance depth boundary clarity 改进深度边界的清晰度 Output: Enhanced depth completion performance 改进的深度完成性能 |
| 9.5 | [9.5] 2504.02437 MonoGS++: Fast and Accurate Monocular RGB Gaussian SLAM [{'name': 'Renwu Li, Wenjing Ke, Dong Li, Lu Tian, Emad Barsoum'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 3D Gaussian mapping Simultaneous Localization and Mapping (SLAM) RGB inputs Visual odometry |
Input: RGB images 仅输入RGB图像 Step1: Dynamic 3D Gaussian insertion 动态三维高斯插入 Step2: Gaussian densification module 高斯密集模块 Step3: Online visual odometry 视觉里程计 Output: Accurate 3D mapping 准确的三维映射 |
| 9.5 | [9.5] 2504.02464 CornerPoint3D: Look at the Nearest Corner Instead of the Center [{'name': 'Ruixiao Zhang, Runwei Guan, Xiangyu Chen, Adam Prugel-Bennett, Xiaohao Cai'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D object detection LiDAR point clouds autonomous driving |
Input: LiDAR point clouds from 3D sensors Step1: Analyze object surfaces and centers Step2: Develop EdgeHead for surface detection Step3: Implement CornerPoint3D for corner prediction Output: Enhanced 3D object detection performance |
| 9.5 | [9.5] 2504.02762 MD-ProjTex: Texturing 3D Shapes with Multi-Diffusion Projection [{'name': 'Ahmet Burak Yildirim, Mustafa Utku Aydogdu, Duygu Ceylan, Aysegul Dundar'}] |
Image and Video Generation 图像生成 | v2 3D shapes text-guided texture generation multi-view consistency |
Input: Pretrained text-to-image diffusion models 预训练的文本到图像扩散模型 Step1: Implement multi-diffusion consistency mechanism 实现多扩散一致性机制 Step2: Fuse noise predictions from multiple views 融合来自多个视角的噪声预测 Step3: Generate coherent textures for 3D shapes 生成一致的3D形状纹理 Output: Fast and consistent textured 3D models 速度快且一致的纹理3D模型 |
| 9.5 | [9.5] 2504.02763 CanonNet: Canonical Ordering and Curvature Learning for Point Cloud Analysis [{'name': 'Benjy Friedmann, Michael Werman'}] |
Point Cloud Processing 点云处理 | v2 point cloud processing geometry curvature estimation neural networks |
Input: Raw point clouds 原始点云 Step1: Preprocessing pipeline for canonical point ordering 预处理管道用于规范点排序 Step2: Geometric learning framework for curvature estimation 几何学习框架用于曲率估计 Output: Enhanced point cloud features 改进的点云特征 |
| 9.5 | [9.5] 2504.02764 Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model [{'name': 'Shengjun Zhang, Jinzhao Li, Xin Fei, Hao Liu, Yueqi Duan'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D scene generation video diffusion model momentum |
Input: Single image 单幅图像 Step1: Construct noisy samples from original features 原始特征构建噪声样本 Step2: Introduce pixel-level momentum to generate video 引入像素级动量生成视频 Step3: Iteratively recover a 3D scene 迭代恢复3D场景 Output: High-fidelity 3D scene 高保真3D场景 |
| 9.5 | [9.5] 2504.02817 Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization [{'name': 'Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu, Xiaoxia Sun, Chong Shang, Kiran Bhat, Deva Ramanan, Jun-Yan Zhu, Maneesh Agrawala, Tinghui Zhou'}] |
3D Generation 三维生成 | v2 3D generation 3D生成 autoregressive models 自回归模型 adaptive tokenization 自适应标记化 |
Input: 3D shapes 3D形状 Step1: Adaptive tokenization 动态标记化 Step2: Octree construction 八叉树构建 Step3: Autoregressive shape generation 自回归形状生成 Output: High-quality 3D content 高质量3D内容 |
| 9.0 | [9.0] 2504.02480 Graph Attention-Driven Bayesian Deep Unrolling for Dual-Peak Single-Photon Lidar Imaging [{'name': 'Kyungmin Choi, JaKeoung Koo, Stephen McLaughlin, Abderrahim Halimi'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction single-photon Lidar Bayesian modeling dual-peak imaging |
Input: Single-photon Lidar data 单光子激光雷达数据 Step1: Histogram data processing 直方图数据处理 Step2: Dual peak feature extraction 双峰特征提取 Step3: Bayesian modeling and neural network unrolling 贝叶斯建模与神经网络展开 Output: 3D reconstruction results 3D重建结果 |
| 8.5 | [8.5] 2504.02158 UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting [{'name': 'Jaehoon Choi, Dongki Jung, Yonghan Lee, Sungmin Eum, Dinesh Manocha, Heesung Kwon'}] |
3D Reconstruction and Modeling 三维重建 | v2 digital twins 数字孪生 UAV 无人机 3D Gaussian Splatting 3D高斯点云 |
Input: UAV images UAV 图像 Step1: Foreground component synthesis 前景组件合成 Step2: Gaussian splatting integration 结合高斯点云 Step3: Data augmentation 数据增强 Output: Digital twin generation 数字孪生生成 |
| 8.5 | [8.5] 2504.02264 MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception [{'name': 'Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao, Qiannan Guo, Jiayin Zhu, Pengfei Li, Zilong Chen, Huiming Yang, Zhiwei Li, Lening Wang, Tiao Tan, Huaping Liu'}] |
Autonomous Driving 自动驾驶 | v2 multimodal learning driver assistance systems multi-task learning |
Input: Multimodal data (driving context, driver behavior) 驱动上下文、多模态数据(驾驶上下文,驾驶员行为) Step 1: Multi-axis region attention to extract features 从多轴区域关注提取特征 Step 2: Dual-branch multimodal embedding to adjust parameters 双支路多模态嵌入调整参数 Step 3: Evaluate on AIDE dataset 在AIDE数据集上评估 Output: Improved recognition performance 提升的识别性能 |
| 8.5 | [8.5] 2504.02454 Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation [{'name': 'Changshuo Wang, Shuting He, Xiang Fang, Meiqing Wu, Siew-Kei Lam, Prayag Tiwari'}] |
Point Cloud Processing 点云处理 | v2 few-shot learning point cloud segmentation 3D reconstruction |
Input: Point clouds and limited labeled data 点云和有限标注数据 Step1: Polynomial fitting for local structure representation 局部结构表示的多项式拟合 Step2: Development of TaylorConv for local structure fitting 开发TaylorConv以进行局部结构拟合 Step3: Constructing variants of TaylorSeg (TaylorSeg-NN, TaylorSeg-PN) 构建TaylorSeg的变体(TaylorSeg-NN,TaylorSeg-PN) Output: Enhanced segmentation of unseen categories 改进的未见类别分割 |
| 8.5 | [8.5] 2504.02517 MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields [{'name': 'Yash Kulthe, Andrew Gilbert, John Collomosse'}] |
Neural Rendering 神经渲染 | v2 3D watermarking Neural Radiance Fields intellectual property 3D content |
Input: NeRF model with watermarking grid 采用带水印网格的NeRF模型 Step1: Extend TensoRF with watermark grid 扩展TensoRF以包含水印网格 Step2: Implement FiLM-based conditional modulation 实现基于FiLM的条件调制 Step3: Train the model with watermark embedding 训练模型以嵌入水印 Output: NeRF model with multiple watermarks 输出:带有多个水印的NeRF模型 |
| 8.5 | [8.5] 2504.02617 PicoPose: Progressive Pixel-to-Pixel Correspondence Learning for Novel Object Pose Estimation [{'name': 'Lihua Liu, Jiehong Lin, Zhenxin Liu, Kui Jia'}] |
3D Reconstruction and Modeling 三维重建 | v2 pose estimation 3D models correspondence learning |
Input: RGB images and CAD models RGB图像和CAD模型 Step1: Feature matching for coarse correspondences 特征匹配以获得粗略对应 Step2: Global transformation estimation for smooth correspondences 全局变换估计以平滑对应 Step3: Local refinement for fine correspondences 局部细化以优化对应 Output: 6D object pose estimation 6D物体姿态估计 |
| 8.5 | [8.5] 2504.02782 GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation [{'name': 'Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li Yuan'}] |
Image Generation 图像生成 | v2 image generation benchmark GPT-4o |
Input: GPT-4o model outputs Step1: Benchmark creation for evaluation Step2: Qualitative and quantitative analysis of generated images Step3: Comparative study with other models Output: Insights on generative performance and limitations |
| 8.5 | [8.5] 2504.02812 BOP Challenge 2024 on Model-Based and Model-Free 6D Object Pose Estimation [{'name': 'Van Nguyen Nguyen, Stephen Tyree, Andrew Guo, Mederic Fourmy, Anas Gouda, Taeyeop Lee, Sungphill Moon, Hyeontae Son, Lukas Ranftl, Jonathan Tremblay, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Stan Birchfield, Jiri Matas, Yann Labbe, Martin Sundermeyer, Tomas Hodan'}] |
6D Object Pose Estimation 6D物体位姿估计 | v2 6D pose estimation object detection model-based model-free |
Input: 6D object pose estimation task 6D物体位姿估计任务 Step1: Develop evaluation methodology 开发评估方法 Step2: Introduce new datasets 引入新数据集 Step3: Implement model-based and model-free approaches 实现基于模型和无模型的方法 Output: Results of the BOP Challenge 2024 2024 BOP挑战的结果 |
| 7.5 | [7.5] 2504.02799 Systematic Evaluation of Large Vision-Language Models for Surgical Artificial Intelligence [{'name': 'Anita Rau, Mark Endo, Josiah Aklilu, Jaewoo Heo, Khaled Saab, Alberto Paderno, Jeffrey Jopling, F. Christopher Holsinger, Serena Yeung-Levy'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models surgical AI |
Input: Large Vision-Language Models 视觉语言模型 Step1: Comprehensive analysis of VLMs 对VLM的综合分析 Step2: Performance evaluation on surgical tasks 对外科任务的性能评估 Step3: Insights on adaptability 适应性洞察 Output: Insights for surgical AI 外科人工智能的洞察 |
| 7.5 | [7.5] 2504.02821 Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models [{'name': 'Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Sparse Autoencoders Vision-Language Models Interpretability |
Input: Sparse Autoencoders (SAEs) 稀疏自编码器 Step1: Framework introduction 框架介绍 Step2: Monosemanticity evaluation 单义性评估 Step3: Application to VLMs 应用到视觉语言模型 Output: Enhanced interpretability of VLMs 改进的视觉语言模型可解释性 |
Arxiv 2025-04-03
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2504.01023 Omnidirectional Depth-Aided Occupancy Prediction based on Cylindrical Voxel for Autonomous Driving [{'name': 'Chaofan Wu, Jiaheng Li, Jinghao Cao, Ming Li, Yongkang Feng, Jiayu Wu Shuwen Xu, Zihang Gao, Sidan Du, Yang Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D perception occupancy prediction autonomous driving cylindrical voxel |
Input: Omnidirectional depth data 全向深度数据 Step1: Build cylindrical voxel representation 构建圆柱体体素表示 Step2: Implement Sketch-Coloring framework 实现素描上色框架 Step3: Evaluate occupancy prediction performance 评估占用预测性能 Output: Enhanced 3D occupancy prediction 改进的3D占用预测 |
| 9.5 | [9.5] 2504.01503 Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment [{'name': 'Ziteng Cui, Xuangeng Chu, Tatsuya Harada'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting 3D高斯点云 novel view synthesis 新视图合成 lighting adaptation 光照适应 |
Input: Multi-view images 多视角图像 Step1: Image processing with per-view color matrix mapping 使用每视图的颜色矩阵映射进行图像处理 Step2: Curve adjustment to adapt to lighting conditions 曲线调整以适应光照条件 Step3: Joint optimization with 3DGS parameters 与3DGS参数共同优化 Output: Enhanced novel views 改进的新视图 |
| 9.5 | [9.5] 2504.01512 High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model [{'name': 'Yiyang Shen, Kun Zhou, He Wang, Yin Yang, Tianjia Shao'}] |
3D Generation 三维生成 | v2 3D generation 三维生成 Gaussian splatting 高斯点云 |
Input: Single-view images 单视图图像 Step1: Feature extraction 特征提取 Step2: Gaussian generation 高斯生成 Step3: 3D reconstruction 3D重建 Output: High-fidelity 3D objects 高保真3D物体 |
| 9.5 | [9.5] 2504.01559 RealityAvatar: Towards Realistic Loose Clothing Modeling in Animatable 3D Gaussian Avatars [{'name': 'Yahui Li, Zhi Zeng, Liming Pang, Guixuan Zhang, Shuwu Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting Dynamic Clothing Modeling Animatable Avatars |
Input: Multi-view videos 多视角视频 Step1: Motion trend modeling 动态趋势建模 Step2: Skeletal feature encoding 骨骼特征编码 Step3: Clothing deformation capture 服装变形捕捉 Output: High-fidelity animatable avatars 高保真动画化虚拟人像 |
| 9.5 | [9.5] 2504.01619 3DBonsai: Structure-Aware Bonsai Modeling Using Conditioned 3D Gaussian Splatting [{'name': 'Hao Wu, Hao Wang, Ruochong Li, Xuran Ma, Hui Xiong'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction bonsai generation Gaussian splatting |
Input: Text descriptions and conditions 输入: 文本描述和条件 Step1: Design trainable 3D space colonization algorithm 第一步: 设计可训练的三维空间殖民算法 Step2: Generate bonsai structures using structure-aware 3D Gaussian splatting 第二步: 使用结构感知的三维高斯点云生成盆栽结构 Step3: Evaluate model with 2D-3D consistency checks 第三步: 使用2D-3D一致性检查评估模型 Output: Complex 3D bonsai models 输出: 复杂的三维盆栽模型 |
| 9.5 | [9.5] 2504.01641 Bridge 2D-3D: Uncertainty-aware Hierarchical Registration Network with Domain Alignment [{'name': 'Zhixin Cheng, Jiacheng Deng, Xinjun Li, Baoqun Yin, Tianzhu Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction image registration point cloud |
Input: Image and point cloud data 图像和点云数据 Step1: Image-to-point cloud registration 基于图像至点云的配准 Step2: Uncertainty-aware matching 关注不确定性匹配 Step3: Domain alignment 域对齐 Output: Accurate transformations for 3D reconstruction 适用于三维重建的准确变换 |
| 9.5 | [9.5] 2504.01647 FlowR: Flowing from Sparse to Dense 3D Reconstructions [{'name': 'Tobias Fischer, Samuel Rota Bul\`o, Yung-Hsu Yang, Nikhil Varma Keetha, Lorenzo Porzi, Norman M\"uller, Katja Schwarz, Jonathon Luiten, Marc Pollefeys, Peter Kontschieder'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction novel view synthesis multi-view flow matching Gaussian splatting |
Input: A set of 2D images of a 3D scene 场景的二维图像集 Step1: Data collection and preprocessing 数据收集与预处理 Step2: 3D reconstruction using 3D Gaussian splatting 采用3D高斯喷溅进行三维重建 Step3: Flow matching to connect sparse and dense renderings 使用流匹配连接稀疏和密集渲染 Output: Improved novel view synthesis and 3D reconstruction 改进的视图合成和三维重建 |
| 9.5 | [9.5] 2504.01732 FIORD: A Fisheye Indoor-Outdoor Dataset with LIDAR Ground Truth for 3D Scene Reconstruction and Benchmarking [{'name': 'Ulas Gunes, Matias Turkulainen, Xuqian Ren, Arno Solin, Juho Kannala, Esa Rahtu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene reconstruction fisheye image dataset |
Input: Fisheye images 鱼眼图像 Step1: Dataset collection 数据集收集 Step2: Point cloud generation 点云生成 Step3: Model evaluation 模型评估 Output: Benchmarking results 基准测试结果 |
| 9.5 | [9.5] 2504.01844 BOGausS: Better Optimized Gaussian Splatting [{'name': "St\'ephane Pateux, Matthieu Gendrin, Luce Morin, Th\'eo Ladune, Xiaoran Jiang"}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting novel view synthesis optimization high-fidelity rendering |
Input: 3D Gaussian Splatting data 3D高斯点云数据 Step1: Analyze training process 分析训练过程 Step2: Propose optimization methodology 提出优化方法 Step3: Model evaluation and comparison 模型评估与比较 Output: Optimized Gaussian models 优化的高斯模型 |
| 9.5 | [9.5] 2504.01872 CoMatcher: Multi-View Collaborative Feature Matching [{'name': 'Jintao Zhang, Zimin Xia, Mingyue Dong, Shuhan Shen, Linwei Yue, Xianwei Zheng'}] |
Multi-view Stereo 多视角立体 | v2 3D reconstruction multi-view matching deep learning feature matching |
Input: Image set of a scene 场景的图像集 Step1: Group images based on co-visibility 根据可见性分组图像 Step2: Collaborative matching using CoMatcher 使用CoMatcher进行协同匹配 Step3: Establish correspondence for 3D reconstruction 建立对应关系以进行3D重建 Output: Reliable multi-view matches 可靠的多视角匹配 |
| 9.5 | [9.5] 2504.01901 Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness [{'name': 'Haochen Wang, Yucheng Zhao, Tiancai Wang, Haoqiang Fan, Xiangyu Zhang, Zhaoxiang Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction visual instruction tuning |
Input: Multi-view images 多视角图像 Step1: Cross-view reconstruction 交叉视图重建 Step2: Global-view reconstruction 全局视图重建 Step3: 3D representation learning 3D 表示学习 Output: Enhanced understanding of 3D scenes 改进的三维场景理解 |
| 9.5 | [9.5] 2504.01956 VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step [{'name': 'Hanyang Wang, Fangfu Liu, Jiawei Chi, Yueqi Duan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene generation video diffusion models sparse views |
Input: Sparse views and corresponding camera poses 输入: 稀疏视图和对应的相机姿态 Step1: Coarse scene generation using a sparse-view 3DGS model 第一步: 使用稀疏视图3DGS模型生成粗略场景 Step2: Rapid distillation through a leap flow strategy 第二步: 通过跃流策略快速蒸馏 Step3: Denoising with a dynamic policy network 第三步: 使用动态策略网络去噪 Output: 3D scenes generated from video input 输出: 从视频输入生成的3D场景 |
| 9.5 | [9.5] 2504.01957 GaussianLSS -- Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting [{'name': 'Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D perception Depth estimation Autonomous driving |
Input: Multi-view images 多视角图像 Step1: Implement uncertainty modeling 实现不确定性建模 Step2: Transform depth distribution into 3D Gaussians 将深度分布转化为3D高斯分布 Step3: Rasterize for BEV feature construction 为BEV特征构建进行光栅化 Output: Uncertainty-aware BEV features 不确定性感知的BEV特征 |
| 9.5 | [9.5] 2504.01960 Diffusion-Guided Gaussian Splatting for Large-Scale Unconstrained 3D Reconstruction and Novel View Synthesis [{'name': 'Niluthpol Chowdhury Mithun, Tuan Pham, Qiao Wang, Ben Southall, Kshitij Minhas, Bogdan Matei, Stephan Mandt, Supun Samarasekera, Rakesh Kumar'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction view synthesis Gaussian Splatting multi-view diffusion models |
Input: Multi-view images 多视角图像 Step1: Generate pseudo-observations using a diffusion model 通过扩散模型生成伪观察 Step2: Apply 3D Gaussian Splatting for optimization 使用三维高斯点云进行优化 Step3: Integrate appearance embeddings and depth priors 集成外观嵌入和深度先验 Output: Enhanced 3D reconstruction and novel views 输出:改进的三维重建和新视图 |
| 9.2 | [9.2] 2504.01476 Enhanced Cross-modal 3D Retrieval via Tri-modal Reconstruction [{'name': 'Junlong Ren, Hao Wang'}] |
Cross-modal 3D Retrieval 跨模态3D检索 | v2 3D retrieval multi-view images point clouds text modalities |
Input: Multi-view images and point clouds 多视角图像和点云 Step1: Joint representation of 3D shapes 3D形状的联合表示 Step2: Tri-modal reconstruction 三模态重建 Step3: Fine-grained 2D-3D fusion 细粒度2D-3D融合 Output: Multimodal embeddings with enhanced alignment 输出:增强对齐的多模态嵌入 |
| 9.0 | [9.0] 2504.01596 DEPTHOR: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image [{'name': 'Jijun Xiang, Xuan Zhu, Xianqi Wang, Yu Wang, Hong Zhang, Fei Guo, Xin Yang'}] |
Depth Estimation 深度估计 | v2 depth enhancement dToF 3D reconstruction depth completion |
Input: Raw dToF signals and RGB images 原始dToF信号与RGB图像 Step1: Simulate real-world dToF data using synthetic datasets 使用合成数据集模拟真实世界的dToF数据 Step2: Develop a depth completion network integrating monocular depth estimation (MDE) 开发整合单目深度估计的深度补全网络 Step3: Perform training with noise-robust strategy 使用抗噪声的训练策略进行训练 Output: High-precision dense depth maps 高精度密集深度图 |
| 9.0 | [9.0] 2504.01941 End-to-End Driving with Online Trajectory Evaluation via BEV World Model [{'name': 'Yingyan Li, Yuqi Wang, Yang Liu, Jiawei He, Lue Fan, Zhaoxiang Zhang'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving trajectory evaluation world model |
Input: Sensor data 传感器数据 Step1: Trajectory prediction 轨迹预测 Step2: Future state prediction 未来状态预测 Step3: Trajectory evaluation 轨迹评估 Output: Optimized trajectories 优化的轨迹 |
| 8.5 | [8.5] 2504.01040 Cal or No Cal? -- Real-Time Miscalibration Detection of LiDAR and Camera Sensors [{'name': 'Ilir Tahiraj, Jeremialie Swadiryus, Felix Fent, Markus Lienkamp'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 miscalibration detection sensor fusion autonomous driving 3D sensing |
Input: LiDAR and camera data 数据集成: LiDAR和摄像头数据 Step1: Feature extraction 特征提取 Step2: Miscalibration state classification 失调状态分类 Step3: Performance analysis 性能分析 Output: Detection results 检测结果 |
| 8.5 | [8.5] 2504.01298 Direction-Aware Hybrid Representation Learning for 3D Hand Pose and Shape Estimation [{'name': 'Shiyong Liu, Zhihao Li, Xiao Tang, Jianzhuang Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D hand pose estimation direction-aware hybrid features joint optimization motion capture |
Input: RGB images from hand motion capture Step1: Fusion of implicit image features and explicit 2D joint coordinates Step2: Joint optimization of 2D and 3D coordinates Step3: Motion capture confidence calculation based on contrastive learning Output: Improved accuracy in 3D hand pose and shape estimation |
| 8.5 | [8.5] 2504.01428 MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation [{'name': 'Zhuangzhuang Chen, Hualiang Wang, Chubin Ou, Xiaomeng Li'}] |
3D Image Translation 三维图像翻译 | v2 3D image translation multi-view alignment optical coherence tomography OCTA |
Input: 3D Optical Coherence Tomography (OCT) images 3D光学相干断层扫描图像 Step1: Pre-train VQ-VAE models for OCT & OCTA data 对OCT和OCTA数据进行VQ-VAE模型预训练 Step2: Multi-view tri-alignment to learn mapping from OCT to OCTA using three views 三视角联合对齐学习从OCT到OCTA的映射 Output: Translated 3D OCTA images 翻译后的3D OCTA图像 |
| 8.5 | [8.5] 2504.01449 Multimodal Point Cloud Semantic Segmentation With Virtual Point Enhancement [{'name': 'Zaipeng Duan, Xuzhong Hu, Pei An, Jie Ma'}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Segmentation 点云分割 Multi-modal Integration 多模态集成 |
Input: LiDAR and image data (virtual points) 激光雷达与图像数据(虚拟点) Step1: Integration of virtual points from images 通过图像整合虚拟点 Step2: Adaptive filtering to select valuable pseudo points 采用自适应过滤选择有价值的伪点 Step3: Noise-robust feature extraction 噪声稳健特征提取 Output: Enhanced semantic segmentation results 改进的语义分割结果 |
| 8.5 | [8.5] 2504.01466 Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes [{'name': 'Kaiwei Zhang, Dandan Zhu, Xiongkuo Min, Guangtao Zhai'}] |
3D Reconstruction and Modeling 三维重建 | v2 mesh saliency 3D reconstruction texture integration |
Input: Mesh models 网格模型 Step1: Dataset creation 数据集创建 Step2: Model development 模型开发 Step3: Validation experiments 验证实验 Output: Saliency predictions for meshes 网格的显著性预测 |
| 8.5 | [8.5] 2504.01620 A Conic Transformation Approach for Solving the Perspective-Three-Point Problem [{'name': 'Haidong Wu, Snehal Bhayani, Janne Heikkil\"a'}] |
3D Reconstruction and Modeling 三维重建 | v2 Perspective-Three-Point problem conic transformation camera pose estimation |
Input: 3D points and their 2D projections 3D点及其2D投影 Step1: Coordinate transformation 坐标变换 Step2: Solving for intersection points 求交点 Step3: Extracting camera pose 信息提取相机位置 Output: Camera pose and optimized parameters 输出:相机位置和优化参数 |
| 8.5 | [8.5] 2504.01648 ProtoGuard-guided PROPEL: Class-Aware Prototype Enhancement and Progressive Labeling for Incremental 3D Point Cloud Segmentation [{'name': 'Haosheng Li, Yuecong Xu, Junjie Chen, Kemi Ding'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D point cloud segmentation 3D点云分割 class-incremental learning 类增量学习 ProtoGuard |
Input: 3D point clouds 3D点云 Step1: Base-class training with prototypes 基类训练与原型 Step2: Novel-class training with pseudo-labels 新类训练与伪标签 Step3: Evaluation of segmentations 分割的评估 Output: Enhanced segmentation accuracy 改进的分割精度 |
| 8.5 | [8.5] 2504.01659 Robust Unsupervised Domain Adaptation for 3D Point Cloud Segmentation Under Source Adversarial Attacks [{'name': 'Haosheng Li, Yuecong Xu, Junjie Chen, Kemi Ding'}] |
3D Point Cloud Processing 点云处理 | v2 3D point cloud segmentation unsupervised domain adaptation adversarial robustness |
Input: 3D point cloud data 3D点云数据 Step1: Adversarial point cloud generation 攻击点云生成 Step2: Dataset formulation 数据集构建 Step3: Framework development 框架开发 Output: Robust segmentation model 稳健的分割模型 |
| 8.5 | [8.5] 2504.01668 Overlap-Aware Feature Learning for Robust Unsupervised Domain Adaptation for 3D Semantic Segmentation [{'name': 'Junjie Chen, Yuecong Xu, Haosheng Li, Kemi Ding'}] |
3D Semantic Segmentation 三维语义分割 | v2 3D semantic segmentation unsupervised domain adaptation autonomous driving |
Input: 3D point cloud data 3D点云数据 Step1: Robustness evaluation评估鲁棒性 Step2: Invertible attention alignment构建可逆注意力对齐模块 Step3: Contrastive memory bank construction构建对比记忆库 Output: Enhanced segmentation performance改进的分割性能 |
| 8.5 | [8.5] 2504.01764 Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation [{'name': 'Mingrui Ye, Lianping Yang, Hegui Zhu, Zenghao Zheng, Xin Wang, Yantao Lo'}] |
3D Human Pose Estimation 3D人类姿态估计 | v2 3D human pose estimation Transformer GCN |
Input: RGB images and videos from a single viewpoint 使用单一视角的RGB图像和视频 Step1: Masking 2D pose features 对2D姿态特征进行掩蔽 Step2: Learning representations using Transformer-GCN model 使用Transformer-GCN模型学习表示 Step3: Adaptive fusion of features 特征的自适应融合 Output: Enhanced 3D human pose estimations 改进的3D人类姿态估计 |
| 8.0 | [8.0] 2504.01589 Text Speaks Louder than Vision: ASCII Art Reveals Textual Biases in Vision-Language Models [{'name': 'Zhaochen Wang, Yujun Cai, Zi Huang, Bryan Hooi, Yiwei Wang, Ming-Hsuan Yang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models (VLMs) 视觉语言模型 ASCII Art ASCII艺术 |
Step1: Evaluate five state-of-the-art VLMs on ASCII art tasks 测试五个最先进的视觉语言模型 |
| 7.5 | [7.5] 2504.01308 Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks [{'name': 'Jiawei Wang, Yushen Zuo, Yuanjun Chai, Zhendong Liu, Yichen Fu, Yichun Feng, Kin-man Lam'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Gaussian noise adversarial attacks |
Input: VLMs and noisy visual inputs (e.g., images with Gaussian noise) Step1: Conduct vulnerability analysis of VLMs absent noise augmentation Step2: Develop Robust-VLGuard dataset with noise-augmented fine-tuning Step3: Evaluate the performance of enhanced VLMs against adversarial perturbations Output: A robust VLM framework able to handle Gaussian noise and improve functionality |
Arxiv 2025-04-02
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.22986 FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction [{'name': 'Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian splatting indoor scene reconstruction multi-view images |
Input: Multi-view images 多视角图像 Step1: Low-cost Cross-View Aggregation framework 低成本跨视角聚合框架 Step2: Pixel-wise triplet fusion method 像素级三重融合方法 Step3: Weighted floater removal strategy 加权漂浮物去除策略 Step4: Depth-regularized per-scene fine-tuning depth-正则化的逐场景微调 Output: Enhanced 3D scene reconstruction 改进的三维场景重建 |
| 9.5 | [9.5] 2503.23022 MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs [{'name': 'Xianglong He, Junyi Chen, Di Huang, Zexiang Liu, Xiaoshui Huang, Wanli Ouyang, Chun Yuan, Yangguang Li'}] |
Mesh Reconstruction 网格重建 | v2 3D reconstruction mesh generation deep learning |
Input: Raw mesh data 原始网格数据 Step1: Encode meshes into continuous tokens 编码网格为连续标记 Step2: Use flow-based model to generate meshes 使用基于流的模型生成网格 Step3: Output the final mesh based on face control 根据面数控制输出最终网格 |
| 9.5 | [9.5] 2503.23024 Empowering Large Language Models with 3D Situation Awareness [{'name': 'Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng, Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 3D scene understanding Vision-Language Models situational awareness |
Input: RGB-D videos RGB-D 视频 Step1: Data collection 数据收集 Step2: Caption generation 标题生成 Step3: Situation grounding 位置基础 Output: Situation-aware dataset 情境感知数据集 |
| 9.5 | [9.5] 2503.23044 CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction [{'name': 'Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han'}] |
3D Reconstruction and Modeling 三维重建 | v2 large-scale reconstruction 大规模重建 geometric accuracy 几何准确性 3D scene modeling 三维场景建模 autonomous driving 自动驾驶 |
Input: Multi-view images 多视角图像 Step1: Develop parallelized hybrid hierarchical 3D representation 构建并行化的混合层次三维表示 Step2: Implement batch-level multi-task rendering 采用批量级别的多任务渲染 Step3: Conduct experiments on large-scale datasets 在大规模数据集上进行实验 Output: Enhanced large-scale 3D scene models 改进的大规模三维场景模型 |
| 9.5 | [9.5] 2503.23162 NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations [{'name': 'Zhenyu Tang, Chaoran Feng, Xinhua Cheng, Wangbo Yu, Junwu Zhang, Yuan Liu, Xiaoxiao Long, Wenping Wang, Li Yuan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting neural fields 3D reconstruction compression methods multilayer perceptron |
Input: Original 3D Gaussian Splatting (3DGS) data 原始三维高斯点云(3DGS)数据 Step1: Compute Gaussian importance scores 计算高斯重要性分数 Step2: Prune less important Gaussians 修剪不太重要的高斯 Step3: Cluster Gaussians based on attributes 根据属性聚类高斯 Step4: Fit separate MLPs for each cluster 为每个聚类拟合不同的多层感知器 (MLPs) Step5: Fine-tune NeuralGS representation and apply frequency loss 对NeuralGS表示进行微调并应用频率损失 Output: Compact 3D representation with reduced storage requirements 输出:具有减小存储要求的紧凑3D表示 |
| 9.5 | [9.5] 2503.23282 AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [{'name': 'Felix Wimbauer, Weirong Chen, Dominik Muhle, Christian Rupprecht, Daniel Cremers'}] |
3D Reconstruction and Modeling 三维重建 | v2 camera poses intrinsics 3D reconstruction SfM dynamic videos |
Input: Casual video inputs 休闲视频输入 Step1: Preprocess video with depth and flow networks 通过深度和流网络预处理视频 Step2: Apply transformer model to estimate camera poses and intrinsics 应用变换器模型估计相机姿态和内参 Step3: Implement trajectory refinement to reduce drift 实施轨迹优化以减少漂移 Output: Accurate camera poses, intrinsics, and 4D pointclouds 输出: 精确的相机姿态、内参和4D点云 |
| 9.5 | [9.5] 2503.23297 ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning [{'name': 'Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue'}] |
3D Visual Grounding 三维视觉定位 | v2 3D visual grounding open-vocabulary neural rendering |
Input: Implicit language descriptions 语言描述 Step1: Adaptive grouping based on physical scale 基于物理尺度的自适应分组 Step2: 3D Gaussian feature splatting 3D高斯特征喷涂 Step3: Object localization 物体定位 Output: Accurate 3D grounding and reasoning 精确的3D定位与推理 |
| 9.5 | [9.5] 2503.23337 Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction [{'name': 'Jingui Ma, Yang Hu, Luyang Tang, Jiayu Yang, Yongqi Zhai, Ronggang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting Compression Novel View Synthesis Real-time Rendering |
Input: 3D Gaussian representation 3D高斯表示 Step1: Introduce prediction technique 引入预测技术 Step2: Implement spatial condition-based prediction 实施基于空间条件的预测 Step3: Develop instance-aware hyper prior model 开发基于实例感知的超先验模型 Output: Compressed 3D Gaussian models 压缩的3D高斯模型 |
| 9.5 | [9.5] 2503.23463 OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model [{'name': 'Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, Alois C. Knoll'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 Vision-Language Model 视觉-语言模型 Autonomous Driving 自动驾驶 Trajectory Generation 轨迹生成 |
Input: Multimodal inputs (3D environmental perception, vehicle state, driver commands) 输入:多模态输入(3D环境感知、车辆状态、驾驶员命令) Step1: Hierarchical vision-language alignment 模块 步骤1:分层视觉-语言对齐模块 Step2: Autoregressive interaction modeling 步骤2:自回归交互建模 Step3: Trajectory generation 步骤3:轨迹生成 Output: Reliable driving trajectories 输出:可靠的驾驶轨迹 |
| 9.5 | [9.5] 2503.23502 Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [{'name': 'Jannik Endres, Oliver Hahn, Charles Corbi\`ere, Simone Schaub-Meyer, Stefan Roth, Alexandre Alahi'}] |
Stereo Vision 立体视觉 | v2 omnidirectional stereo matching depth estimation robotics |
Input: Equirectangular images captured by two vertically stacked omnidirectional cameras 拍摄的两个垂直堆叠的全景相机的等距图像 Step1: Integrate the pre-trained monocular depth foundation model into the stereo matching architecture 将预训练的单眼深度基础模型集成到立体匹配架构中 Step2: Apply a two-stage training strategy to adapt features to omnidirectional stereo matching 采用两阶段训练策略将特征适应于全景立体匹配 Step3: Fine-tune the model using scale-invariant loss against actual depth data 使用无尺度损失对实际深度数据微调模型 Output: Enhanced disparity estimation and improved depth accuracy 改进的视差估计和深度准确性 |
| 9.5 | [9.5] 2503.23664 LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds [{'name': 'Masahiko Tsuji, Hitoshi Niigaki, Ryuichi Tanida'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 Visual Localization 视觉定位 3D Reconstruction 三维重建 LiDAR Camera Pose Estimation 相机姿态估计 |
Input: Query image and 3D LiDAR point clouds 查询图像和3D LiDAR点云 Step1: Extract keypoints from the reference image 从参考图像中提取关键点 Step2: Generate a 3D reference map with keypoints using LiDAR 生成包含关键点的3D参考地图,使用LiDAR Step3: Assign 3D LiDAR points directly to 2D keypoints 直接将3D LiDAR点分配给2D关键点 Output: Enhanced camera pose estimation through a dense 3D reference map 输出:通过密集的3D参考地图增强相机姿态估计 |
| 9.5 | [9.5] 2503.23670 Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation [{'name': 'Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, Zhizhong Han'}] |
Surface Reconstruction 表面重建 | v2 3D reconstruction signed distance functions sparse point clouds |
Input: Sparse point clouds 稀疏点云 Step1: Learn bijective surface parameterization (BSP) 学习双射表面参数化 Step2: Construct dynamic deformation network 动态变形网络构建 Step3: Optimize grid deformation to refine surfaces 优化网格变形以精炼表面 Output: Signed distance functions (SDF) representation of the surface 表面的符号距离函数表示 |
| 9.5 | [9.5] 2503.23684 Detail-aware multi-view stereo network for depth estimation [{'name': 'Haitao Tian, Junyang Li, Chenxing Wang, Helong Jiang'}] |
Multi-view Stereo 多视角立体 | v2 Multi-view stereo Depth estimation 3D reconstruction Geometric depth |
Input: Multi-view images 多视角图像 Step1: Geometric depth embedding 数据几何深度嵌入 Step2: Image synthesis loss enhancement 图像合成损失增强 Step3: Adaptive depth interval adjustment 自适应深度区间调整 Output: Accurate depth maps 精确的深度图 |
| 9.5 | [9.5] 2503.23747 Consistency-aware Self-Training for Iterative-based Stereo Matching [{'name': 'Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Liu YangChenXu, Wu Cailin, Feng Xu, Tao Chen'}] |
Multi-view Stereo 多视角立体 | v2 stereo matching depth estimation 3D vision |
Input: Pairs of rectified images 视差分割的处理方式 Step1: Introduce consistency-aware self-training framework 引入一致性自我训练框架 Step2: Implement consistency-aware soft filtering module 实现一致性软过滤模块 Step3: Adjust weights of pseudo-labels with soft-weighted loss 使用软加权损失调整伪标签权重 Output: Enhanced stereo matching performance 提升的立体匹配性能 |
| 9.5 | [9.5] 2503.23881 ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image [{'name': 'Tianyi Gong, Boyan Li, Yifei Zhong, Fangxin Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction single-view reconstruction Gaussian Splatting panoramic image generation |
Input: Single-view image 单视图图像 Step1: Generate panoramic image 生成全景图像 Step2: Depth estimation 深度估计 Step3: 3D Gaussian Splatting model training 训练3D高斯点云模型 Step4: Refinement with video diffusion 通过视频扩散进行优化 Output: Consistent immersive 3D scene 一致的沉浸式3D场景 |
| 9.5 | [9.5] 2503.23965 Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving [{'name': 'Miao Fan, Xuxu Kong, Shengtong Xu, Haoyi Xiong, Xiangzeng Liu'}] |
Autonomous Driving 自动驾驶 | v2 traffic light recognition autonomous driving neural networks real-time processing |
Input: Multi-frame video data 多帧视频数据 Step1: Temporal data integration 时间数据集成 Step2: Neural network architecture design 神经网络架构设计 Step3: Real-time processing capabilities evaluation 实时处理能力评估 Output: Robust traffic light recognition results 稳健的交通灯识别结果 |
| 9.5 | [9.5] 2503.23993 DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model [{'name': 'Ming Yuan, Sichao Wang, Chuang Zhang, Lei He, Qing Xu, Jianqiang Wang'}] |
Depth Estimation 深度估计 | v2 depth completion autonomous driving diffusion model 3D reconstruction |
Input: Sparse depth maps and RGB images 稀疏深度图和 RGB 图像 Step1: Feature extraction 特征提取 Step2: Conditional diffusion process 条件扩散过程 Step3: Multi-step iterative refinement 多步迭代优化 Output: Dense depth map 生成密集深度图 |
| 9.5 | [9.5] 2503.24210 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting [{'name': 'Seungjun Lee, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction motion deblurring event streams |
Input: Blurry multi-view images and event streams 模糊的多视角图像和事件流 Step1: Optimize deblurring 3DGS 通过联合利用实际捕获的事件流和预训练的扩散模型约束去模糊3DGS Step2: Introduce EDI constraints 引入事件双积分约束 Step3: Leverage diffusion prior 为了进一步改善细节,利用扩散先验 Output: Enhanced 3D representations 改进的3D表示 |
| 9.5 | [9.5] 2503.24229 Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes [{'name': 'Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka'}] |
3D Point Cloud Processing 点云处理 | v2 3D point cloud segmentation synthetic data generative models |
Input: 3D point cloud data 3D点云数据 Step1: Data generation 数据生成 Step2: Model training 模型训练 Step3: Model evaluation 模型评估 Output: Improved instance segmentation results 改进的实例分割结果 |
| 9.5 | [9.5] 2503.24374 ERUPT: Efficient Rendering with Unposed Patch Transformer [{'name': 'Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor'}] |
3D Reconstruction and Modeling 三维重建 | v2 novel view synthesis scene reconstruction unposed imagery 3D reconstruction computer vision |
Input: Small collections of RGB images 小规模RGB图像集 Step1: Patch-based querying of unposed imagery 基于补丁的无姿势图像查询 Step2: Latent camera pose learning 学习潜在相机姿态 Step3: Efficient model rendering and training 模型的高效渲染和训练 Output: High-quality rendered images 高质量渲染图像 |
| 9.5 | [9.5] 2503.24382 Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views [{'name': 'Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, Zhaopeng Cui'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction neural rendering view synthesis layered Gaussian |
Input: Extremely sparse views (3-4) 极稀疏视图 Step1: Use dense stereo reconstruction to recover coarse geometry 使用稠密立体重建恢复粗糙几何 Step2: Apply layered Gaussian representation for scene modeling 应用分层高斯表示进行场景建模 Step3: Integrate reconstruction and generation iteratively 迭代整合重建与生成 Output: High-quality 3D reconstruction and unbounded view synthesis 输出: 高质量三维重建和无界视图合成 |
| 9.2 | [9.2] 2503.23882 GLane3D : Detecting Lanes with Graph of 3D Keypoints [{'name': 'Halil \.Ibrahim \"Ozt\"urk, Muhammet Esat Kalfao\u{g}lu, Ozsel Kilinc'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D lane detection 3D车道检测 autonomous driving 自动驾驶 |
Input: Multi-view images 多视角图像 Step1: Keypoint detection 关键点检测 Step2: Sequential connection prediction 顺序连接预测 Step3: Lane extraction 车道提取 Output: Complete 3D lanes 完整的三维车道 |
| 9.2 | [9.2] 2503.24366 StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting [{'name': 'Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian splatting stochastic rasterization neural rendering |
Input: 3D Gaussian splatting 3D高斯点云 Step1: Implement stochastic rasterization 实现随机光栅化 Step2: Use Monte Carlo estimator 使用蒙特卡罗估计器 Step3: Render using OpenGL shaders 使用OpenGL着色器渲染 Output: Fast and high-quality rendering 快速高质量渲染 |
| 9.2 | [9.2] 2503.24391 Easi3R: Estimating Disentangled Motion from DUSt3R Without Training [{'name': 'Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D reconstruction dynamic segmentation camera pose estimation |
Input: Dynamic image collections 动态图像集 Step1: Attention adaptation during inference 推理期间注意力适应 Step2: Dynamic object segmentation 动态目标分割 Step3: Camera pose estimation 相机位姿估计 Output: 4D dense point map reconstruction 4D稠密点图重建 |
| 9.0 | [9.0] 2503.23587 PhysPose: Refining 6D Object Poses with Physical Constraints [{'name': "Martin Malenick\'y, Martin C\'ifka, M\'ed\'eric Fourmy, Louis Montaut, Justin Carpentier, Josef Sivic, Vladimir Petrik"}] |
Robotic Perception 机器人感知 | v2 6D object pose estimation physical constraints robotics scene reconstruction autonomous driving |
Input: Images and geometric scene description (输入: 图像和几何场景描述) Step 1: Estimate initial 6D object poses (步骤 1: 估计初始的 6D 物体姿态) Step 2: Post-process to enforce physical consistency (步骤 2: 后处理以强制物理一致性) Step 3: Evaluate and refine pose estimates (步骤 3: 评估和改进姿态估计) Output: Accurate and physically plausible object poses (输出: 准确且物理上合理的物体姿态) |
| 9.0 | [9.0] 2503.23963 A Benchmark for Vision-Centric HD Mapping by V2I Systems [{'name': 'Miao Fan, Shanshan Yu, Shengtong Xu, Kun Jiang, Haoyi Xiong, Xiangzeng Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 Vehicle-to-Infrastructure (V2I) HD mapping autonomous driving neural framework vectorized maps |
Input: Collaborative camera frames from vehicles and infrastructures 车辆与基础设施的协作摄像头帧 Step1: Data collection and annotation 数据收集与标注 Step2: Extract features from images 提取图像特征 Step3: Construct BEV representation 构建鸟瞰视图表示 Step4: Generate and update vectorized maps 生成并更新矢量化地图 Output: Vectorized high-definition maps 矢量化高清地图 |
| 8.5 | [8.5] 2503.22963 SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry [{'name': 'Peiyu Chen, Fuling Lin, Weipeng Guan, Peng Lu'}] |
Visual Odometry 视觉里程计 | v2 event camera inertial odometry 3D reconstruction sensor fusion |
Input: Event streams from event cameras 事件相机的事件流 Step1: Event feature detection using CNN 使用CNN进行事件特征检测 Step2: Descriptor matching for loop closure using GNN 使用GNN进行环路闭合的描述符匹配 Step3: Optimize pipeline using TensorRT 优化使用TensorRT的管道 Output: Robust event-inertial odometry 可靠的事件惯性里程计 |
| 8.5 | [8.5] 2503.22976 From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D [{'name': 'Jiahui Zhang, Yurui Chen, Yanpeng Zhou, Yueming Xu, Ze Huang, Jilin Mei, Junhui Chen, Yu-Jie Yuan, Xinyue Cai, Guowei Huang, Xingyue Quan, Hang Xu, Li Zhang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language models 3D reasoning dataset creation |
Input: 2D images and 3D ground-truth data 2D图像和三维真实数据 Step1: Data generation and annotation 数据生成与标注 Step2: Dataset creation for spatial tasks 数据集创建用于空间任务 Step3: Benchmark development for evaluation 基准开发用于评估 Output: Enhanced spatial reasoning capabilities 改进的空间推理能力 |
| 8.5 | [8.5] 2503.23062 Shape and Texture Recognition in Large Vision-Language Models [{'name': 'Sagi Eppel, Mor Bismut, Alona Faktor'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 shape recognition texture recognition vision-language models 3D understanding |
Input: Real-world images 真实世界图像 Step1: Dataset creation 数据集创建 Step2: Shape and texture recognition tests 形状和纹理识别测试 Step3: Evaluation of large vision-language models 大型视觉语言模型的评估 Output: Performance metrics on shape and texture recognition 形状和纹理识别的性能指标 |
| 8.5 | [8.5] 2503.23105 Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments [{'name': 'Yifan Xu, Vineet Kamat, Carol Menassa'}] |
Robotic Perception 机器人感知 | v2 semantic segmentation autonomous assistive robots |
Input: Built environment scenes 场景输入 Step1: Scene segmentation 场景分割 Step2: Semantic recognition 语义识别 Step3: Uncertainty alignment 不确定性对齐 Output: Adaptive navigation model 自适应导航模型 |
| 8.5 | [8.5] 2503.23109 Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction [{'name': 'Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 HD map construction autonomous vehicles |
Input: HD maps 高精度地图 Step1: Uncertainty resampling 不确定性重采样 Step2: Structural feature extraction 结构特征提取 Step3: Map vectorization 地图矢量化 Output: Generalized HD maps 泛化的高精度地图 |
| 8.5 | [8.5] 2503.23313 SpINR: Neural Volumetric Reconstruction for FMCW Radars [{'name': 'Harshvardhan Takawale, Nirupam Roy'}] |
Volumetric Reconstruction 体积重建 | v2 volumetric reconstruction FMCW radar 3D modeling |
Input: FMCW radar data Step1: Frequency-domain modeling Step2: Implicit neural representation training Step3: 3D volumetric geometry reconstruction Output: High-resolution 3D models |
| 8.5 | [8.5] 2503.23331 HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation [{'name': 'Hongwei Zheng, Han Li, Wenrui Dai, Ziyang Zheng, Chenglin Li, Junni Zou, Hongkai Xiong'}] |
3D Reconstruction 三维重建 | v2 3D human pose estimation occlusion hierarchical poses sparse representation |
Input: Sparse 2D poses from monocular images 单目图像中的稀疏2D姿势 Step1: Multi-scale skeleton tokenization 多尺度骨架标记 Step2: Hierarchical pose generation 分层姿势生成 Step3: 2D-to-3D lifting with generated poses 通过生成的姿势进行2D到3D的提升 Output: Enhanced 3D human poses 改进的3D人体姿势 |
| 8.5 | [8.5] 2503.23365 OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users [{'name': 'Zhangcun Yan, Jianqing Li, Peng Hang, Jian Sun'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人学 | v2 High-resolution trajectory data 高分辨率轨迹数据 Vulnerable Road Users 弱势交通参与者 Autonomous driving 自动驾驶 |
Input: High-resolution trajectory data 高分辨率轨迹数据 Step1: Data collection 数据收集 Step2: Data integration 数据集成 Step3: Analysis of behavioral patterns 行为模式分析 Output: Comprehensive dataset for autonomous driving autonomous systems 输出: 自主驾驶系统的综合数据集 |
| 8.5 | [8.5] 2503.23519 BoundMatch: Boundary detection applied to semi-supervised segmentation for urban-driving scenes [{'name': 'Haruya Ishikawa, Yoshimitsu Aoki'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 semi-supervised segmentation boundary detection autonomous driving |
Input: Unlabeled images and labeled data; 输入: 无标签图像和有标签数据 Step1: Implement Boundary-Semantic Fusion to combine boundary cues with segmentation; 步骤1: 实施边界-语义融合,将边界线索与分割结合 Step2: Integrate Boundary Consistency Regularized Multi-Task Learning; 步骤2: 集成边界一致性正则化多任务学习 Step3: Evaluate model performance on various datasets; 步骤3: 在各种数据集上评估模型性能 Output: Enhanced segmentation masks with improved boundary delineation; 输出: 改进的分割掩码,具有更好的边界划分 |
| 8.5 | [8.5] 2503.23577 Multiview Image-Based Localization [{'name': 'Cameron Fiore, Hongyi Fan, Benjamin Kimia'}] |
3D Localization 3D定位 | v2 3D localization image retrieval autonomous driving multiview correspondences |
Input: Query image and anchor images 查询图像和锚图像 Step1: Compute NetVLAD descriptors and SuperPoint features 计算NetVLAD描述符和SuperPoint特征 Step2: Retrieve top-K anchor images 根据特征描述符检索前K个锚图像 Step3: Estimate relative poses 估计相对位姿 Step4: Find optimal camera center and orientation 查找最佳相机中心和方向 Output: Accurate pose estimation 输出:准确的位姿估计 |
| 8.5 | [8.5] 2503.23606 Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries [{'name': 'Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo'}] |
Depth Estimation 深度估计 | v2 depth estimation depth from defocus neural networks |
Input: Pair of differently defocused images 处理:两个不同模糊的图像 Step1: Data representation through Blurry-Edges 数据表示,使用模糊边缘 Step2: Depth calculation using closed-form DfD relation 深度计算,使用封闭形式的Dfd关系 Output: Depth estimation along the boundaries 输出:沿边界的深度估计 |
| 8.5 | [8.5] 2503.23647 Introducing the Short-Time Fourier Kolmogorov Arnold Network: A Dynamic Graph CNN Approach for Tree Species Classification in 3D Point Clouds [{'name': 'Said Ohamouddou, Mohamed Ohamouddou, Hanaa El Afia, Abdellatif El Afia, Rafik Lasri, Raddouane Chiheb'}] |
3D Point Cloud Processing 点云处理 | v2 3D point cloud tree species classification deep learning STFT-KAN |
Input: 3D point clouds 3D点云 Step1: Implementation of STFT-KAN STFT-KAN的实现 Step2: Model training and evaluation 模型训练与评估 Output: Tree species classification results 树种分类结果 |
| 8.5 | [8.5] 2503.23702 3D Dental Model Segmentation with Geometrical Boundary Preserving [{'name': 'Shufan Xi, Zexian Liu, Junlin Chang, Hongyu Wu, Xiaogang Wang, Aimin Hao'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction segmentation digital dentistry computer vision |
Input: 3D intraoral scan mesh 3D口腔扫描网格 Step1: Selective downsampling method 选择性下采样方法 Step2: Boundary feature extraction 边界特征提取 Step3: Model evaluation 模型评估 Output: Improved segmentation accuracy 改进的分割精度 |
| 8.5 | [8.5] 2503.23980 SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency [{'name': 'Yanbo Wang, Yongtao Chen, Chuan Cao, Tianchen Deng, Wentao Zhao, Jingchuan Wang, Weidong Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 LiDAR Semantic segmentation Zero-shot learning |
Input: General LiDAR point clouds 一般LiDAR点云 Step1: Data transformation 数据转换 Step2: Zero-shot learning paradigm implementation 零样本学习范式实现 Step3: Pre-segmentation result generation 预分割结果生成 Output: Enhanced annotation efficiency 改进的注释效率 |
| 8.5 | [8.5] 2503.24091 4D mmWave Radar in Adverse Environments for Autonomous Driving: A Survey [{'name': 'Xiangyuan Peng, Miao Tang, Huawei Sun, Lorenzo Servadei, Robert Wille'}] |
Autonomous Driving 自动驾驶 | v2 4D mmWave radar autonomous driving adverse environments |
Input: 4D mmWave radar data 4D毫米波雷达数据 Step1: Review of existing datasets 现有数据集的回顾 Step2: Analysis of methods and models 方法和模型的分析 Step3: Discussion on challenges and future directions 挑战与未来方向的讨论 Output: Comprehensive survey report 综合调查报告 |
| 8.5 | [8.5] 2503.24129 It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data [{'name': 'Dominik Schnaus, Nikita Araslanov, Daniel Cremers'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language correspondence unsupervised learning quadratic assignment problem |
Input: Vision and language embeddings 视觉和语言嵌入 Step1: Formulate unsupervised matching as a quadratic assignment problem 将无监督匹配形式化为二次分配问题 Step2: Develop a heuristic for matching 提出匹配的启发式方法 Step3: Evaluate on datasets 在数据集上评估 Output: Unsupervised classification results without annotations 无需注释的无监督分类结果 |
| 8.5 | [8.5] 2503.24270 Visual Acoustic Fields [{'name': 'Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting sound localization sound generation |
Input: Multiscale features from 3D Gaussian Splatting (3DGS) Step1: Sound generation module utilizing a conditional diffusion model Step2: Sound localization module for querying impact positions in 3D scene Output: Generated sounds and localized impact sources in 3D space |
| 8.5 | [8.5] 2503.24306 Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge [{'name': "Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jon\'a\v{s} \v{S}er\'ych, Michal Neoral, Ji\v{r}\'i Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queir\'os, Est\^ev\~ao Lima, Jo\~ao L. Vila\c{c}a, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen, Ling Li, Xiang Ma, Xiaojian Li, Mona Sheikh Zeinoddin, Xu Wang, Zafer Tandogdu, Greg Shaw, Evangelos Mazomenos, Danail Stoyanov, Yuxin Chen, Zijian Wu, Alexander Ladikos, Simon DiMaio, Septimiu E. Salcudean, Omid Mohareri"}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction autonomous driving surgery algorithms |
Input: Infrared video sequences 红外视频序列 Step1: Data quantification 数据量化 Step2: Algorithm submission 提交算法 Step3: Performance evaluation 性能评估 Output: Algorithm performance metrics 算法性能指标 |
| 8.5 | [8.5] 2503.24381 UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving [{'name': 'Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li'}] |
Autonomous Driving 自动驾驶 | v2 Occupancy forecasting 占用预测 Autonomous driving 自动驾驶 3D Occupancy labels 三维占用标签 |
Input: Camera images 摄像头图像 Step1: Data integration 数据集成 Step2: Occupancy forecasting 模型预测 Step3: Evaluation of performance 性能评估 Output: Occupancy predictions 预测的占用情况 |
| 8.0 | [8.0] 2503.22932 Bi-Level Multi-View fuzzy Clustering with Exponential Distance [{'name': 'Kristina P. Sinaga'}] |
Multi-view Stereo 多视角立体 | v2 multi-view clustering fuzzy c-means exponential distance |
Input: Multi-view data 多视角数据 Step1: Extend fuzzy c-means clustering 扩展模糊c均值聚类 Step2: Incorporate heat-kernel coefficients 引入热核系数 Step3: Develop bi-level clustering algorithm 开发双层聚类算法 Output: Enhanced clustering results 改进的聚类结果 |
| 7.5 | [7.5] 2503.23131 RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning [{'name': 'Alexander Vogel, Omar Moured, Yufan Chen, Jiaming Zhang, Rainer Stiefelhagen'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 Vision-Language Models chart understanding visual grounding |
Input: Chart images 图表图像 Step1: Data collection 数据收集 Step2: Instruction tuning 指令调优 Step3: Visual grounding implementation 可视化基础实现 Output: RefChartQA dataset and model outputs RefChartQA数据集及模型输出 |
| 7.5 | [7.5] 2503.23452 VideoGen-Eval: Agent-based System for Video Generation Evaluation [{'name': 'Yuhang Yang, Ke Fan, Shangkun Sun, Hongxiang Li, Ailing Zeng, FeiLin Han, Wei Zhai, Wei Liu, Yang Cao, Zheng-Jun Zha'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation evaluation system |
Input: Video generation prompts 视频生成提示 Step1: Content structuring 内容结构 Step2: Content judgment 内容评估 Step3: Dynamic evaluation tools 部件动态评估工具 Output: Evaluation results 评估结果 |
| 7.5 | [7.5] 2503.23573 DASH: Detection and Assessment of Systematic Hallucinations of VLMs [{'name': 'Maximilian Augustin, Yannic Neuhaus, Matthias Hein'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 VLMs object hallucination evaluation |
Input: Real-world images 真实世界的图像 Step1: Data retrieval 数据检索 Step2: Systematic hallucination detection 系统性幻觉检测 Output: Clusters of hallucinated images 幻觉图像的聚类 |
| 6.5 | [6.5] 2503.23508 Re-Aligning Language to Visual Objects with an Agentic Workflow [{'name': 'Yuming Chen, Jiangyan Feng, Haodong Zhang, Lijun Gong, Feng Zhu, Rui Zhao, Qibin Hou, Ming-Ming Cheng, Yibing Song'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 language-object alignment vision-language models |
Input: Detected objects and raw language expressions 检测到的对象和原始语言表达 Step1: Reasoning state and planning 状态推理与规划 Step2: Adaptive prompt adjustment 自适应提示调整 Step3: Feedback analysis from LLM LLM反馈分析 Output: Re-aligned language expressions 重新对齐的语言表达 |
Arxiv 2025-04-01
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.22986 FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction [{'name': 'Yunsong Wang, Tianxin Huang, Hanlin Chen, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction indoor scenes |
Input: Multi-view images 多视角图像 Step1: Low-cost Cross-View Aggregation framework 低成本交叉视图聚合框架 Step2: Pixel-wise triplet fusion method 像素级三元组融合方法 Step3: Weighted floater removal strategy 加权浮子去除策略 Step4: Depth-regularized per-scene fine-tuning 深度规则化每场景微调 Output: Enhanced 3D Gaussian primitives 改进的三维高斯原语 |
| 9.5 | [9.5] 2503.23044 CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction [{'name': 'Yuanyuan Gao, Hao Li, Jiaqi Chen, Zhengyu Zou, Zhihang Zhong, Dingwen Zhang, Xiao Sun, Junwei Han'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 large-scale scene reconstruction 大规模场景重建 3D Gaussian Splatting 3D高斯点 multi-GPU rendering 多GPU渲染 |
Input: Multi-view images 多视角图像 Step1: Dynamic voxel allocation 动态体素分配 Step2: Batch rendering techniques 批量渲染技术 Step3: Parallel training and rendering 并行训练与渲染 Output: High-fidelity 3D models 高保真3D模型 |
| 9.5 | [9.5] 2503.23162 NeuralGS: Bridging Neural Fields and 3D Gaussian Splatting for Compact 3D Representations [{'name': 'Zhenyu Tang, Chaoran Feng, Xinhua Cheng, Wangbo Yu, Junwu Zhang, Yuan Liu, Xiaoxiao Long, Wenping Wang, Li Yuan'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Gaussian Splatting 3D高斯点云 Neural Rendering 神经渲染 3D Reconstruction 三维重建 |
Input: Original 3D Gaussian Splatting 3DGS 原始3D高斯点云 Step1: Importance calculation 重要性计算 Step2: Gaussian clustering 高斯聚类 Step3: Tiny MLP fitting 小型多层感知机拟合 Output: Compact 3D representation 紧凑的3D表示 |
| 9.5 | [9.5] 2503.23282 AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos [{'name': 'Felix Wimbauer, Weirong Chen, Dominik Muhle, Christian Rupprecht, Daniel Cremers'}] |
3D Reconstruction and Modeling 三维重建 | v2 camera poses intrinsics 3D reconstruction dynamic scenes |
Input: Dynamic video sequences 动态视频序列 Step1: Predict camera poses and intrinsics 预测相机姿态和内参 Step2: Apply uncertainty-based loss formulation 应用基于不确定性的损失函数 Step3: Perform trajectory refinement 进行轨迹优化 Output: High-quality 4D pointclouds 高质量4D点云 |
| 9.5 | [9.5] 2503.23297 ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning [{'name': 'Zhenyang Liu, Yikai Wang, Sixiao Zheng, Tongying Pan, Longfei Liang, Yanwei Fu, Xiangyang Xue'}] |
3D Visual Grounding 3D视觉定位 | v2 3D grounding language models Gaussian features |
Input: Implicit language descriptions 语言描述 Step1: Use 3D Gaussian feature fields 使用3D高斯特征场 Step2: Adaptive grouping based on object scale 根据物体尺度进行自适应分组 Step3: Localize occluded objects 进行遮挡物体定位 Output: Enhanced 3D grounding 改进的三维定位 |
| 9.5 | [9.5] 2503.23463 OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model [{'name': 'Xingcheng Zhou, Xuyuan Han, Feng Yang, Yunpu Ma, Alois C. Knoll'}] |
Autonomous Driving 自动驾驶 | v2 Vision-Language Models 视觉语言模型 Autonomous Driving 自动驾驶 |
Input: Multimodal inputs including 3D environmental perception 3D环境感知, ego vehicle states 自我车辆状态, and driver commands 驾驶员命令 Step1: Hierarchical vision-language alignment process 层次化视觉语言对齐过程 Step2: Model generates driving trajectories 生成驾驶轨迹 Step3: Evaluate agent-env-ego interactions 评估主体-环境-自我交互 Output: Reliable driving actions 可靠的驾驶动作 |
| 9.5 | [9.5] 2503.23502 Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model [{'name': 'Jannik Endres, Oliver Hahn, Charles Corbi\`ere, Simone Schaub-Meyer, Stefan Roth, Alexandre Alahi'}] |
Stereo Matching 立体匹配 | v2 omnidirectional stereo matching depth estimation mobile robotics 3D reconstruction |
Input: Equirectangular images captured with two vertically stacked omnidirectional cameras 通过两个垂直堆叠的全向相机采集的等距图像 Step1: Integrate depth foundation model into stereo matching architecture 将深度基础模型集成到立体匹配架构中 Step2: Two-stage training: Adapt stereo matching head and fine-tune foundation model 两阶段训练:调整立体匹配头和微调基础模型 Step3: Evaluate performance on real-world datasets 在真实数据集上评估性能 Output: Enhanced disparity estimation and 3D depth maps 改进的视差估计和三维深度图 |
| 9.5 | [9.5] 2503.23664 LiM-Loc: Visual Localization with Dense and Accurate 3D Reference Maps Directly Corresponding 2D Keypoints to 3D LiDAR Point Clouds [{'name': 'Masahiko Tsuji, Hitoshi Niigaki, Ryuichi Tanida'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction LiDAR |
Input: 2D keypoints from reference images 参考图像中的2D关键点 Step1: Generate 3D reference map using 3D reconstruction 使用3D重建生成3D参考地图 Step2: Assign 3D LiDAR point clouds to keypoints 将3D LiDAR点云分配给关键点 Step3: Improve pose estimation accuracy 提高姿态估计准确性 Output: Dense and accurate 3D reference maps 密集且准确的3D参考地图 |
| 9.5 | [9.5] 2503.23670 Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation [{'name': 'Takeshi Noda, Chao Chen, Junsheng Zhou, Weiqi Zhang, Yu-Shen Liu, Zhizhong Han'}] |
3D Reconstruction and Modeling 三维重建 | v2 Surface Reconstruction 表面重建 Signed Distance Functions 有符号距离函数 Sparse Point Clouds 稀疏点云 |
Input: Sparse point clouds 稀疏点云 Step1: Learn dynamic deformation network 学习动态变形网络 Step2: Bijective surface parameterization (BSP) learning 学习双射表面参数化 Step3: Grid deformation optimization (GDO) 应用网格变形优化 Output: Continuous signed distance functions (SDF) 生成连续的有符号距离函数 |
| 9.5 | [9.5] 2503.23684 Detail-aware multi-view stereo network for depth estimation [{'name': 'Haitao Tian, Junyang Li, Chenxing Wang, Helong Jiang'}] |
Multi-view Stereo 多视角立体 | v2 depth estimation multi-view stereo 3D reconstruction image synthesis |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Geometric depth embedding 算法开发 Step3: Adaptive depth interval adjustment 自适应深度间隔调整 Output: Accurate depth maps 精确的深度图 |
| 9.5 | [9.5] 2503.23747 Consistency-aware Self-Training for Iterative-based Stereo Matching [{'name': 'Jingyi Zhou, Peng Ye, Haoyu Zhang, Jiakang Yuan, Rao Qiang, Liu YangChenXu, Wu Cailin, Feng Xu, Tao Chen'}] |
Stereo Matching 立体匹配 | v2 stereo matching self-training depth estimation computer vision autonomous driving |
Input: Stereo image pairs 立体图像对 Step1: Reliability evaluation 可靠性评估 Step2: Soft filtering of pseudo-labels 伪标签软过滤 Step3: Model training with weighted loss 使用加权损失进行模型训练 Output: Enhanced stereo matching results 改进的立体匹配结果 |
| 9.5 | [9.5] 2503.23881 ExScene: Free-View 3D Scene Reconstruction with Gaussian Splatting from a Single Image [{'name': 'Tianyi Gong, Boyan Li, Yifei Zhong, Fangxin Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction single-view image reconstruction Gaussian Splatting panoramic image generation |
Input: Single-view image 单视图图像 Step1: Generate panoramic image 生成全景图像 Step2: Depth estimation 深度估计 Step3: Train initial 3D Gaussian Splatting model 训练初始3D高斯点云模型 Step4: GS refinement with video diffusion priors 使用视频扩散先验进行GS优化 Output: Enhanced immersive 3D scene 改进的沉浸式3D场景 |
| 9.5 | [9.5] 2503.23882 GLane3D : Detecting Lanes with Graph of 3D Keypoints [{'name': 'Halil \.Ibrahim \"Ozt\"urk, Muhammet Esat Kalfao\u{g}lu, Ozsel Kilinc'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D lane detection autonomous driving |
Input: 3D lane data 3D车道数据 Step1: Keypoint detection 关键点检测 Step2: Connection prediction 连接预测 Step3: Lane construction 道路构建 Output: 3D lanes 3D车道 |
| 9.5 | [9.5] 2503.23963 A Benchmark for Vision-Centric HD Mapping by V2I Systems [{'name': 'Miao Fan, Shanshan Yu, Shengtong Xu, Kun Jiang, Haoyi Xiong, Xiangzeng Liu'}] |
Autonomous Driving 自动驾驶 | v2 HD mapping vehicle-to-infrastructure autonomous driving |
Input: Collaborative camera frames from vehicles and infrastructure 车辆和基础设施的协作摄像头帧 Step1: Data collection and annotation 数据收集和标注 Step2: Feature extraction 特征提取 Step3: Map encoding and decoding 地图编码和解码 Output: Vectorized high-definition maps 向量化高精度地图 |
| 9.5 | [9.5] 2503.23965 Video-based Traffic Light Recognition by Rockchip RV1126 for Autonomous Driving [{'name': 'Miao Fan, Xuxu Kong, Shengtong Xu, Haoyi Xiong, Xiangzeng Liu'}] |
Autonomous Driving 自动驾驶 | v2 traffic light recognition autonomous driving real-time processing end-to-end neural network |
Input: Video frames from ego cameras 来自自我摄像机的视频帧 Step1: Multi-frame processing 多帧处理 Step2: Traffic light detection and classification 交通信号灯检测和分类 Step3: Integration with HD maps 与高清地图集成 Output: Real-time traffic light recognition 实时交通信号灯识别 |
| 9.5 | [9.5] 2503.23993 DenseFormer: Learning Dense Depth Map from Sparse Depth and Image via Conditional Diffusion Model [{'name': 'Ming Yuan, Sichao Wang, Chuang Zhang, Lei He, Qing Xu, Jianqiang Wang'}] |
Depth Estimation 深度估计 | v2 depth completion 深度补全 autonomous driving 自动驾驶 conditional diffusion model 条件扩散模型 |
Input: Sparse depth maps and RGB images 稀疏深度图和RGB图像 Step1: Data integration 数据集成 Step2: Conditional depth denoising using diffusion model 条件深度去噪 Step3: Multi-step iterative refinement 多步迭代优化 Output: Enhanced dense depth maps 改进的稠密深度图 |
| 9.5 | [9.5] 2503.24210 DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting [{'name': 'Seungjun Lee, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Gaussian Splatting motion deblurring event streams novel view synthesis |
Input: Blurry multi-view images and event streams 模糊多视角图像及事件流 Step1: Optimize deblurring 3DGS using event streams and diffusion prior 优化去模糊3DGS,利用事件流和扩散先验 Step2: Enhance edge details and color accuracy 强化边缘细节和颜色准确性 Output: Improved sharp 3D representations 改进的锐利3D表示 |
| 9.5 | [9.5] 2503.24366 StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting [{'name': 'Shakiba Kheradmand, Delio Vicini, George Kopanas, Dmitry Lagun, Kwang Moo Yi, Mark Matthews, Andrea Tagliasacchi'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian splatting stochastic rasterization neural rendering volume rendering |
Input: 3D Gaussian splatting data 3D高斯点云数据 Step1: Integrate stochastic rasterization techniques 整合随机光栅化技术 Step2: Implement unbiased Monte Carlo estimator 实现无偏蒙特卡洛估计器 Step3: Optimize rendering performance 优化渲染性能 Output: Efficient 3D rendering output 高效的三维渲染输出 |
| 9.5 | [9.5] 2503.24374 ERUPT: Efficient Rendering with Unposed Patch Transformer [{'name': 'Maxim V. Shugaev, Vincent Chen, Maxim Karrenbach, Kyle Ashley, Bridget Kennedy, Naresh P. Cuntoor'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 3D reconstruction 3D重建 novel view synthesis 新视图合成 efficient rendering 高效渲染 |
Input: Collections of RGB images RGB图像集 Step1: Patch-based querying using unposed imagery 基于补丁的查询,使用未定位的图像 Step2: Model training with learned latent camera pose 模型训练,使用学习到的潜在相机姿态 Step3: Efficient rendering at high frame rates 实现高帧率的高效渲染 Output: Novel view synthesis of 3D scenes 3D场景的新视图合成 |
| 9.5 | [9.5] 2503.24382 Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views [{'name': 'Chong Bao, Xiyu Zhang, Zehao Yu, Jiale Shi, Guofeng Zhang, Songyou Peng, Zhaopeng Cui'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction unbounded view synthesis neural rendering |
Input: Extremely sparse views (3-4) 极少视角输入(3-4) Step1: Employ dense stereo reconstruction model to recover coarse geometry 使用密集立体重建模型恢复粗略几何 Step2: Introduce layered Gaussian-based representation to model scenes 引入分层高斯表示来建模场景 Step3: Perform bootstrap optimization for noise refinement and occlusion filling 进行引导优化以消除噪声和填补遮挡区域 Step4: Implement iterative fusion of reconstruction and generation 进行重建与生成的迭代融合 Output: High-quality 3D reconstruction and novel view synthesis 输出:高质量的三维重建和新视图合成 |
| 9.5 | [9.5] 2503.24391 Easi3R: Estimating Disentangled Motion from DUSt3R Without Training [{'name': 'Xingyu Chen, Yue Chen, Yuliang Xiu, Andreas Geiger, Anpei Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D reconstruction camera pose estimation dynamic scenes |
Input: Dynamic video footage 动态视频 Step1: Attention map analysis 注意力图分析 Step2: Motion disentanglement 运动解耦 Step3: Point cloud reconstruction 点云重建 Output: Segmented dynamic regions and camera parameters 分割的动态区域和相机参数 |
| 9.2 | [9.2] 2503.23024 Empowering Large Language Models with 3D Situation Awareness [{'name': 'Zhihao Yuan, Yibo Peng, Jinke Ren, Yinghong Liao, Yatong Han, Chun-Mei Feng, Hengshuang Zhao, Guanbin Li, Shuguang Cui, Zhen Li'}] |
3D Scene Understanding 3D场景理解 | v2 3D situation awareness 3D情境意识 Vision-Language Models 视觉语言模型 Large Language Models 大型语言模型 |
Input: RGB-D videos RGB-D 视频 Step1: Data collection 数据收集 Step2: Dataset generation 数据集生成 Step3: Situation grounding module integration 情境基础模块集成 Output: Enhanced 3D situational awareness 改进的三维情境感知 |
| 9.2 | [9.2] 2503.23109 Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction [{'name': 'Xiaolu Liu, Ruizi Yang, Song Wang, Wentong Li, Junbo Chen, Jianke Zhu'}] |
Autonomous Driving 自动驾驶 | v2 HD map construction autonomous driving |
Input: Images from onboard cameras 车载摄像头图像 Step 1: Feature extraction 特征提取 Step 2: Uncertainty-aware detection 不确定性感知检测 Step 3: Map vectorization 地图向量化 Output: Generalized HD maps 泛化的高清地图 |
| 9.2 | [9.2] 2503.23313 SpINR: Neural Volumetric Reconstruction for FMCW Radars [{'name': 'Harshvardhan Takawale, Nirupam Roy'}] |
3D Reconstruction and Modeling 三维重建 | v2 volumetric reconstruction neural representation radar imaging |
Input: FMCW radar data 频率调制连续波雷达数据 Step1: Construct frequency-domain model 构建频率域模型 Step2: Integrate neural representations 集成神经表示 Step3: Perform volumetric reconstruction 进行体积重建 Output: High-resolution 3D scenes 高分辨率3D场景 |
| 9.2 | [9.2] 2503.24229 Pre-training with 3D Synthetic Data: Learning 3D Point Cloud Instance Segmentation from 3D Synthetic Scenes [{'name': 'Daichi Otsuka, Shinichi Mae, Ryosuke Yamada, Hirokatsu Kataoka'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D point cloud instance segmentation synthetic data autonomous driving |
Input: 3D point cloud data 3D点云数据 Step1: Pre-training with synthetic data 使用合成数据进行预训练 Step2: Instance segmentation model training 实例分割模型训练 Step3: Evaluation of segmentation performance 分割性能评估 Output: Enhanced 3D instance segmentation model 改进的三维实例分割模型 |
| 9.0 | [9.0] 2503.23337 Enhancing 3D Gaussian Splatting Compression via Spatial Condition-based Prediction [{'name': 'Jingui Ma, Yang Hu, Luyang Tang, Jiayu Yang, Yongqi Zhai, Ronggang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting compression prediction technique |
Input: 3D Gaussian Splatting data 3D高斯点云数据 Step1: Introduce prediction technique 引入预测技术 Step2: Compress using spatial condition 基于空间条件进行压缩 Step3: Model evaluation using residuals 采用残差进行模型评估 Output: Compressed Gaussian representation 压缩的高斯表示 |
| 8.5 | [8.5] 2503.22976 From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D [{'name': 'Jiahui Zhang, Yurui Chen, Yanpeng Zhou, Yueming Xu, Ze Huang, Jilin Mei, Junhui Chen, Yu-Jie Yuan, Xinyue Cai, Guowei Huang, Xingyue Quan, Hang Xu, Li Zhang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models 3D reasoning spatial perception |
Input: 2D spatial data 2D空间数据 Step1: Data generation 数据生成 Step2: Annotation pipeline 注释管道 Step3: Model training 模型训练 Output: Spatial tasks 数据集成和空间任务 |
| 8.5 | [8.5] 2503.23022 MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs [{'name': 'Xianglong He, Junyi Chen, Di Huang, Zexiang Liu, Xiaoshui Huang, Wanli Ouyang, Chun Yuan, Yangguang Li'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction mesh generation efficient algorithms |
Input: Raw meshes 原始网格 Step1: Encoding raw meshes into continuous tokens 将原始网格编码为连续的标记 Step2: Decoding tokens back into mesh structure 将标记解码为网格结构 Step3: Generating mesh topology using diffusion model 使用扩散模型生成网格拓扑 Output: High-quality, controllable mesh generation 高质量、可控的网格生成 |
| 8.5 | [8.5] 2503.23365 OnSiteVRU: A High-Resolution Trajectory Dataset for High-Density Vulnerable Road Users [{'name': 'Zhangcun Yan, Jianqing Li, Peng Hang, Jian Sun'}] |
Autonomous Driving 自动驾驶 | v2 Vulnerable Road Users autonomous driving trajectory dataset |
Input: High-resolution trajectory data 高分辨率轨迹数据 Step1: Dataset development 数据集开发 Step2: Data integration 数据集成 Step3: Evaluation of trajectory coverage 轨迹覆盖评估 Output: Comprehensive VRU behavior representation 全面的VRU行为表现 |
| 8.5 | [8.5] 2503.23368 Towards Physically Plausible Video Generation via VLM Planning [{'name': 'Xindi Yang, Baolu Li, Yiming Zhang, Zhenfei Yin, Lei Bai, Liqian Ma, Zhiyong Wang, Jianfei Cai, Tien-Tsin Wong, Huchuan Lu, Xu Jia'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation vision-language models physics-aware motion planning |
Input: Image and text prompt 输入:图像和文本提示 Step1: VLM planning for motion trajectories VLM规划运动轨迹 Step2: Video generation with noise injection 噪声注入的视频生成 Output: Physically plausible videos 输出:物理上合理的视频 |
| 8.5 | [8.5] 2503.23508 Re-Aligning Language to Visual Objects with an Agentic Workflow [{'name': 'Yuming Chen, Jiangyan Feng, Haodong Zhang, Lijun Gong, Feng Zhu, Rui Zhao, Qibin Hou, Ming-Ming Cheng, Yibing Song'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 language-based object detection vision-language models data alignment |
Input: Image with detected objects 带有检测对象的图像 Step1: Generate raw language expressions 生成原始语言表达 Step2: Plan actions based on agent's reasoning 根据代理的推理计划行动 Step3: Adjust image and text prompts 调整图像和文本提示 Output: Re-aligned expressions with improved accuracy 输出: 精确度提高的重新对齐表达 |
| 8.5 | [8.5] 2503.23573 DASH: Detection and Assessment of Systematic Hallucinations of VLMs [{'name': 'Maximilian Augustin, Yannic Neuhaus, Matthias Hein'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models object hallucinations image retrieval |
Input: Real-world images 真实世界图像 Step1: Image-based retrieval imagery-based 检索图像 Step2: Clustering similar images 聚类相似图像 Step3: Evaluation of hallucinations 评估幻觉 Output: Identified clusters of hallucinatory images 识别的幻觉图像簇 |
| 8.5 | [8.5] 2503.23577 Multiview Image-Based Localization [{'name': 'Cameron Fiore, Hongyi Fan, Benjamin Kimia'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction visual localization multiview correspondence autonomous systems |
Input: Multiview images 多视角图像 Step1: Image feature extraction 图像特征提取 Step2: Relative translation estimate 计算相对平移 Step3: Optimal pose computation 从多视角相应中计算最优姿态 Output: Accurate localization results 准确定位结果 |
| 8.5 | [8.5] 2503.23587 PhysPose: Refining 6D Object Poses with Physical Constraints [{'name': "Martin Malenick\'y, Martin C\'ifka, M\'ed\'eric Fourmy, Louis Montaut, Justin Carpentier, Josef Sivic, Vladimir Petrik"}] |
6D Pose Estimation 6D对象姿态估计 | v2 6D pose estimation physical consistency object-centric scene understanding robotics |
Input: Image and geometric description of the scene 图像和场景几何描述 Step1: Estimate initial poses for objects 估计对象的初始姿态 Step2: Post-processing optimization with physical constraints 引入物理约束的后处理优化 Output: Refined 6D object poses 改进的6D对象姿态 |
| 8.5 | [8.5] 2503.23606 Blurry-Edges: Photon-Limited Depth Estimation from Defocused Boundaries [{'name': 'Wei Xu, Charles James Wagner, Junjie Luo, Qi Guo'}] |
Depth Estimation 深度估计 | v2 depth estimation 深度估计 depth from defocus 失焦深度 photon-limited images 光子限制图像 |
Input: Defocused images 失焦图像 Step1: Image patch representation 图像块表示 Step2: Neural network prediction 神经网络预测 Step3: Depth calculation using DfD equation 使用DfD方程计算深度 Output: Depth maps 深度图 |
| 8.5 | [8.5] 2503.23647 Introducing the Short-Time Fourier Kolmogorov Arnold Network: A Dynamic Graph CNN Approach for Tree Species Classification in 3D Point Clouds [{'name': 'Said Ohamouddoua, Mohamed Ohamouddoub, Rafik Lasrib, Hanaa El Afiaa, Raddouane Chiheba, Abdellatif El Afiaa'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Point Cloud Tree Species Classification Kolmogorov-Arnold Network |
Input: 3D point clouds from TLS and ALS 三维激光扫描点云 Step1: Implement Short-Time Fourier Transform (STFT) 使用短时傅里叶变换 Step2: Develop lightweight Dynamic Graph CNN (liteDGCNN) 开发轻量级动态图卷积神经网络 Step3: Evaluate performance and parameter reduction 评估性能和参数减少 Output: Classified tree species with reduced model complexity 输出:分类树种并降低模型复杂度 |
| 8.5 | [8.5] 2503.23702 3D Dental Model Segmentation with Geometrical Boundary Preserving [{'name': 'Shufan Xi, Zexian Liu, Junlin Chang, Hongyu Wu, Xiaogang Wang, Aimin Hao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction tooth segmentation digital dentistry |
Input: 3D intraoral scan mesh 3D口腔扫描网格 Step1: Selective downsampling 方法:选择性下采样 Step2: Boundary feature extraction 边界特征提取 Step3: Model evaluation 模型评估 Output: Improved tooth segmentation 提高的牙齿分割效果 |
| 8.5 | [8.5] 2503.23980 SALT: A Flexible Semi-Automatic Labeling Tool for General LiDAR Point Clouds with Cross-Scene Adaptability and 4D Consistency [{'name': 'Yanbo Wang, Yongtao Chen, Chuan Cao, Tianchen Deng, Wentao Zhao, Jingchuan Wang, Weidong Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 LiDAR zero-shot learning semi-automatic labeling |
Input: Raw LiDAR data 原始激光雷达数据 Step1: Data transformation 数据转换 Step2: Zero-shot learning application 零样本学习应用 Step3: Pre-segmentation generation 预分割生成 Output: High-quality pre-segmented LiDAR data 高质量预分割的激光雷达数据 |
| 8.5 | [8.5] 2503.24091 4D mmWave Radar in Adverse Environments for Autonomous Driving: A Survey [{'name': 'Xiangyuan Peng, Miao Tang, Huawei Sun, Lorenzo Servadei, Robert Wille'}] |
Autonomous Driving 自动驾驶 | v2 4D mmWave radar autonomous driving perception |
Input: 4D mmWave radar data 4D毫米波雷达数据 Step1: Review existing datasets 现有数据集综述 Step2: Analyze methods for perception and SLAM 感知与SLAM方法分析 Step3: Discuss challenges and future directions 挑战与未来方向讨论 Output: Comprehensive survey report 综合调查报告 |
| 8.5 | [8.5] 2503.24270 Visual Acoustic Fields [{'name': 'Yuelei Li, Hyunjin Kim, Fangneng Zhan, Ri-Zhao Qiu, Mazeyu Ji, Xiaojun Shan, Xueyan Zou, Paul Liang, Hanspeter Pfister, Xiaolong Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting Visual Acoustic Fields cross-modal learning |
Input: Visual and acoustic data 视觉和声学数据 Step1: Collect multiview images 收集多视角图像 Step2: Record impact sounds 记录撞击声音 Step3: Use structure-from-motion for camera pose estimation 使用运动结构估计摄像机姿态 Step4: Implement 3D Gaussian Splatting for scene reconstruction 使用3D高斯点云重建场景 Step5: Generate sound based on visual cues 使用视觉线索生成声音 Step6: Localize sound sources within the scene 确定场景内声音源的位置 Output: Aligned visual-sound pairs 输出对齐的视觉-声音对 |
| 8.5 | [8.5] 2503.24381 UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving [{'name': 'Yuping Wang, Xiangyu Huang, Xiaokang Sun, Mingxuan Yan, Shuo Xing, Zhengzhong Tu, Jiachen Li'}] |
Occupancy Forecasting and Prediction in Autonomous Driving 自动驾驶中的占用预测与预测 | v2 Occupancy Forecasting 占用预测 Autonomous Driving 自动驾驶 3D Prediction 三维预测 |
Input: Multi-view images 多视角图像 Step1: Data unification 数据统一 Step2: Novel metric development 新指标开发 Step3: Algorithm validation 算法验证 Output: Unified occupancy predictions 统一的占用预测 |
| 8.0 | [8.0] 2503.24129 It's a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data [{'name': 'Dominik Schnaus, Nikita Araslanov, Daniel Cremers'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Correspondence 视觉-语言对应 Unsupervised Learning 无监督学习 |
Input: Vision and language embeddings 视觉和语言嵌入 Step1: Formulate matching as a quadratic assignment problem 将匹配公式化为二次分配问题 Step2: Develop a heuristic solver 发展启发式解法 Step3: Conduct extensive empirical study 开展广泛的实证研究 Output: Unsupervised classification outcomes 无监督分类结果 |
| 7.5 | [7.5] 2503.23105 Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments [{'name': 'Yifan Xu, Vineet Kamat, Carol Menassa'}] |
Autonomous Systems and Robotics 自主系统与机器人 | v2 semantic segmentation robotic navigation |
Input: Scene images 场景图像 Step1: Segment different rooms/regions of the scene 划分场景中的不同房间/区域 Step2: Leverage VLM to get similarity scores between descriptions and rooms 利用视觉语言模型获得描述与房间之间的相似度分数 Step3: Use adaptive conformal prediction (ACP) to select rooms according to similarity scores 使用自适应的保形预测根据相似度分数选择房间 Output: Enhanced robot navigation capabilities 提升机器人导航能力 |
| 7.5 | [7.5] 2503.23200 A GAN-Enhanced Deep Learning Framework for Rooftop Detection from Historical Aerial Imagery [{'name': 'Pengyu Chen, Sicheng Wang, Cuizhen Wang, Senrong Wang, Beiao Huang, Lu Huang, Zhe Zang'}] |
Image Generation 图像生成 | v2 rooftop detection historical imagery Generative Adversarial Networks image enhancement |
Input: Historical aerial imagery 历史航空图像 Step1: Image colorization using DeOldify 图像上色采用DeOldify Step2: Super-resolution enhancement using Real-ESRGAN 超分辨率增强采用Real-ESRGAN Step3: Train rooftop detection models 训练屋顶检测模型 Output: Improved rooftop detection performance 改进的屋顶检测性能 |
| 7.5 | [7.5] 2503.23388 COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation [{'name': 'Fanding Huang, Jingyan Jiang, Qinting Jiang, Hebei Li, Faisal Nadeem Khan, Zhi Wang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Test-Time Adaptation |
Input: Test samples 测试样本 Step1: Cache refinement 缓存改进 Step2: Semantic graph construction 语义图构建 Step3: Hyper-class querying 超类查询 Output: Adapted predictions 适应性预测 |
| 7.5 | [7.5] 2503.24306 Point Tracking in Surgery--The 2024 Surgical Tattoos in Infrared (STIR) Challenge [{'name': "Adam Schmidt, Mert Asim Karaoglu, Soham Sinha, Mingang Jang, Ho-Gun Ha, Kyungmin Jung, Kyeongmo Gu, Ihsan Ullah, Hyunki Lee, Jon\'a\v{s} \v{S}er\'ych, Michal Neoral, Ji\v{r}\'i Matas, Rulin Zhou, Wenlong He, An Wang, Hongliang Ren, Bruno Silva, Sandro Queir\'os, Est\^ev\~ao Lima, Jo\~ao L. Vila\c{c}a, Shunsuke Kikuchi, Atsushi Kouno, Hiroki Matsuzaki, Tongtong Li, Yulu Chen, Ling Li, Xiang Ma, Xiaojian Li, Mona Sheikh Zeinoddin, Xu Wang, Zafer Tandogdu, Greg Shaw, Evangelos Mazomenos, Danail Stoyanov, Yuxin Chen, Zijian Wu, Alexander Ladikos, Simon DiMaio, Septimiu E. Salcudean, Omid Mohareri"}] |
3D Reconstruction and Modeling 三维重建 | v2 point tracking 3D reconstruction surgery autonomous probe-based scanning |
Input: Point tracking data for surgery 手术点跟踪数据 Step1: Challenge design 挑战设计 Step2: Algorithm submission and evaluation 算法提交与评估 Step3: Performance measurement based on accuracy and efficiency 性能测量基于准确性和效率 Output: Quantitative results for tracking algorithms 跟踪算法的定量结果 |
Arxiv 2025-03-31
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.21958 NeRF-based Point Cloud Reconstruction using a Stationary Camera for Agricultural Applications [{'name': 'Kibon Ku, Talukder Z Jubery, Elijah Rodriguez, Aditya Balu, Soumik Sarkar, Adarsh Krishnamurthy, Baskar Ganapathysubramanian'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction NeRF point cloud agriculture |
Input: Images captured by a stationary camera 静态相机捕获的图像 Step1: COLMAP-based pose estimation COLMAP基础的姿态估计 Step2: Pose transformation to simulate camera movement 姿态转换以模拟相机移动 Step3: NeRF training using captured images 使用捕获的图像进行NeRF训练 Output: High-resolution point clouds 高分辨率点云 |
| 9.5 | [9.5] 2503.22060 Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges [{'name': 'Ukcheol Shin, Jinsun Park'}] |
Depth Estimation 深度估计 | v2 depth estimation thermal imaging autonomous driving multi-modal dataset robust perception |
Input: Synchronized multi-modal data 包含同步的多模态数据 Step1: Dataset construction 数据集构建 Step2: Depth estimation evaluation 深度估计评估 Step3: Benchmark analysis 基准分析 Output: Standardized benchmark results 标准化基准结果 |
| 9.5 | [9.5] 2503.22087 Mitigating Trade-off: Stream and Query-guided Aggregation for Efficient and Effective 3D Occupancy Prediction [{'name': 'Seokha Moon, Janghyun Baek, Giseop Kim, Jinkyu Kim, Sunwook Choi'}] |
3D Occupancy Prediction 3D占用预测 | v2 3D occupancy prediction 3D占用预测 autonomous driving 自动驾驶 multi-view images 多视角图像 |
Input: Multi-view images 多视角图像 Step1: Stream-based Voxel Aggregation 流式体素聚合 Step2: Query-guided Aggregation 查询引导聚合 Step3: Model evaluation 模型评估 Output: 3D occupancy prediction 3D占用预测 |
| 9.5 | [9.5] 2503.22154 Permutation-Invariant and Orientation-Aware Dataset Distillation for 3D Point Clouds [{'name': 'Jae-Young Yim, Dongwook Kim, Jae-Young Sim'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D point clouds dataset distillation feature alignment |
Input: 3D point clouds 3D点云 Step1: Permutation invariant feature matching 排列不变特征匹配 Step2: Orientation optimization 方向优化 Step3: Dataset distillation 数据集蒸馏 Output: Optimized synthetic dataset 优化的合成数据集 |
| 9.5 | [9.5] 2503.22204 Segment then Splat: A Unified Approach for 3D Open-Vocabulary Segmentation based on Gaussian Splatting [{'name': 'Yiren Lu, Yunlai Zhou, Yiran Qiao, Chaoda Song, Tuo Liang, Jing Ma, Yu Yin'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D segmentation 3D分割 Gaussian Splatting 高斯点云 autonomous systems 自主系统 |
Input: Multi-view images 多视角图像 Step1: Object-specific Gaussian initialization 面向对象的高斯初始化 Step2: Segmentation via Gaussian Splatting 通过高斯点云分割 Step3: Optimization and scene reconstruction 优化和场景重建 Output: 3D object segmentation output 3D对象分割结果 |
| 9.5 | [9.5] 2503.22231 CoGen: 3D Consistent Video Generation via Adaptive Conditioning for Autonomous Driving [{'name': 'Yishen Ji, Ziyue Zhu, Zhenxin Zhu, Kaixin Xiong, Ming Lu, Zhiqi Li, Lijun Zhou, Haiyang Sun, Bing Wang, Tong Lu'}] |
3D Generation 三维生成 | v2 3D generation autonomous driving video generation 3D consistency |
Input: HD maps and bounding boxes 用于视频生成的HD地图和边界框 Step1: Generate 3D conditions 生成3D条件 Step2: Develop spatially adaptive framework 开发空间自适应框架 Step3: Incorporate consistency adapter 添加一致性适配器 Output: High-quality driving videos 生成高质量的驾驶视频 |
| 9.5 | [9.5] 2503.22324 AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation [{'name': 'Chenyang Xu, XingGuo Deng, Rui Zhong'}] |
3D Reconstruction 三维重建 | v2 3D Gaussian Splatting 3D reconstruction Novel View Synthesis |
Input: Scene representation using 3D Gaussian Splatting 3D高斯点云 Step1: Enhance manifold complexity of input features 加强输入特征的流形复杂性 Step2: Implement Adaptive Frequency Encoding Module (AFEM) 实现自适应频率编码模块 Step3: Apply high-frequency reinforce loss 使用高频强化损失 Output: Improved rendering fidelity and high-frequency detail 改进的渲染保真度和高频细节 |
| 9.5 | [9.5] 2503.22328 VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow [{'name': 'Yancong Lin, Shiming Wang, Liangliang Nan, Julian Kooij, Holger Caesar'}] |
Scene Flow Estimation 场景流估计 | v2 scene flow motion rigidity autonomous driving |
Input: LiDAR scans from autonomous driving applications LiDAR扫描 Step1: Data collection 数据收集 Step2: Implementation of a Voting Module 投票模块的实施 Step3: Scene flow estimation using local rigidity scenes 估计使用局部刚度的场景流 Output: Enhanced motion estimation 改进的运动估计 |
| 9.5 | [9.5] 2503.22349 GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion [{'name': 'Li-Heng Chen, Zi-Xin Zou, Chang Liu, Tianjiao Jing, Yan-Pei Cao, Shi-Sheng Huang, Hongbo Fu, Hua Huang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction pose-free reconstruction camera pose estimation sparse view surface reconstruction |
Input: Unposed images 未标定图像 Step1: Implement Geometric Consistent Ray Diffusion model (GCRayDiffusion) 实施几何一致的射线扩散模型 (GCRayDiffusion) Step2: Use triplane-based signed distance field (SDF) for learning 使用三平面签名距离场 (SDF) 进行学习 Step3: Improve camera pose estimation and surface reconstruction through neural rays 改善相机位姿估计和表面重建通过神经射线 Output: Accurate pose-free surface reconstruction results 精确的无位姿表面重建结果 |
| 9.5 | [9.5] 2503.22430 MVSAnywhere: Zero-Shot Multi-View Stereo [{'name': 'Sergio Izquierdo, Mohamed Sayed, Michael Firman, Guillermo Garcia-Hernando, Daniyar Turmukhambetov, Javier Civera, Oisin Mac Aodha, Gabriel Brostow, Jamie Watson'}] |
Multi-view Stereo 多视角立体 | v2 Multi-View Stereo Depth Estimation 3D Reconstruction Zero-Shot Learning |
Input: Multiple posed RGB images 多个姿态的RGB图像 Step 1: Depth estimation using transformer architecture 使用变压器架构进行深度估计 Step 2: Cost volume construction using geometric metadata 使用几何元数据构造成本体积 Step 3: Model evaluation and comparison with baselines 模型评估及与基线比较 Output: Accurate and 3D-consistent depth maps 输出:准确且三维一致的深度图 |
| 9.5 | [9.5] 2503.22436 NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving [{'name': 'Fuhao Li, Huan Jin, Bin Gao, Liaoyuan Fan, Lihui Jiang, Long Zeng'}] |
3D Visual Grounding 视觉定位 | v2 multi-view 3D visual grounding autonomous driving language grounding object localization |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Instruction processing 指令处理 Step3: Localization using 3D geometric information 使用3D几何信息进行定位 Output: Localized target objects 本地化目标对象 |
| 9.5 | [9.5] 2503.22437 EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting [{'name': 'Xu Wang, Shuai Zhang, Baoru Huang, Danail Stoyanov, Evangelos B. Mazomenos'}] |
3D Reconstruction 三维重建 | v2 3D Reconstruction Endoscopic Surgery Gaussian Splatting |
Input: Endoscopic videos 内窥镜视频 Step1: Depth estimation 深度估计 Step2: Model generation 模型生成 Step3: Scene reconstruction 场景重建 Output: Complete 3D surgical scenes 完整的三维手术场景 |
| 9.5 | [9.5] 2503.22537 LIM: Large Interpolator Model for Dynamic Reconstruction [{'name': 'Remy Sabathier, Niloy J. Mitra, David Novotny'}] |
Dynamic Reconstruction 动态重建 | v2 4D reconstruction implicit 3D representations mesh tracking |
Input: Implicit 3D representations at times t0 and t1 Step1: Interpolation using causal consistency loss Step2: Mesh tracking across time Output: High-speed tracked 4D assets |
| 9.5 | [9.5] 2503.22676 TranSplat: Lighting-Consistent Cross-Scene Object Transfer with 3D Gaussian Splatting [{'name': 'Boyang (Tony), Yu, Yanlin Jin, Ashok Veeraraghavan, Akshat Dave, Guha Balakrishnan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting object transfer relighting scene rendering |
Input: Multi-view images containing objects from a source scene with delineating masks. Step1: Fit a Gaussian Splatting model to both source and target scenes for object extraction and environment mapping. Step2: Perform 3D object segmentation based on 2D masks to extract precise object geometry. Step3: User-guided insertion of the extracted object into the target scene with automatic position and orientation refinement. Step4: Calculate per-Gaussian radiance transfer functions via spherical harmonic analysis to adapt object's appearance for the target scene lighting. Output: Realistically transferred 3D objects in the target scene. |
| 9.5 | [9.5] 2503.22677 DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness [{'name': 'Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi'}] |
3D Generation 三维生成 | v2 3D reconstruction physical stability simulation feedback |
Input: 3D object images 物体图像 Step1: Construct stability score dataset 构建稳定性评分数据集 Step2: Fine-tune 3D generator using stability scores 使用稳定性评分微调3D生成器 Step3: Evaluate physical stability 评估物理稳定性 Output: Physically stable 3D objects 物理稳定的3D对象 |
| 9.2 | [9.2] 2503.22218 ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting [{'name': 'Wenjie Liu, Zhongliang Liu, Xiaoyan Yang, Man Sha, Yang Li'}] |
Neural Rendering 神经渲染 | v2 3D style transfer Neural Rendering 3D Gaussian Splatting |
Input: Scene content and style images 场景内容和风格图像 Step1: Controllable matching of images 可控图像匹配 Step2: Feature alignment for style transfer 特征对齐以进行风格转换 Step3: Style transfer with depth preservation 保持深度的风格转换 Output: Stylized 3D scenes 风格化的三维场景 |
| 9.2 | [9.2] 2503.22351 One Look is Enough: A Novel Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation Models on High-Resolution Images [{'name': 'Byeongjun Kwon, Munchurl Kim'}] |
Depth Estimation 深度估计 | v2 monocular depth estimation high-resolution images depth discontinuity |
Input: High-resolution images 高分辨率图像 Step1: Grouped Patch Consistency Training 组块一致性训练 Step2: Bias Free Masking 去偏见掩码 Step3: Depth refinement on each patch 每个块的深度修正 Output: Accurate depth estimation results 准确的深度估计结果 |
| 8.5 | [8.5] 2503.21830 Shape Generation via Weight Space Learning [{'name': 'Maximilian Plattner, Arturs Berzins, Johannes Brandstetter'}] |
3D Generation 三维生成 | v2 3D shape generation weight space learning topology geometry phase transition |
Input: 3D shape-generative model 3D形状生成模型 Step1: Analyze weight space weight space分析 Step2: Experiment with phase transitions 进行相变实验 Step3: Ensure controlled geometry changes 确保控制几何变化 Output: Enhanced shape generation capabilities 改进的形状生成能力 |
| 8.5 | [8.5] 2503.22020 CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models [{'name': 'Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Ming-Yu Liu, Donglai Xiang, Gordon Wetzstein, Tsung-Yi Lin'}] |
Robotics and Vision-Language Models 机器人和视觉语言模型 | v2 vision-language-action models robot manipulation visual reasoning chain-of-thought reasoning |
Input: Visual-language-action models 视觉语言动作模型 Step1: Incorporate visual chain-of-thought reasoning 引入视觉思维链推理 Step2: Generate subgoal images 生成子目标图像 Step3: Predict action sequences 预测动作序列 Output: Enhanced robotic control capabilities 增强的机器人控制能力 |
| 8.5 | [8.5] 2503.22093 How Well Can Vison-Language Models Understand Humans' Intention? An Open-ended Theory of Mind Question Evaluation Benchmark [{'name': 'Ximing Wen, Mallika Mainali, Anik Sen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Theory of Mind Visual Question Answering Human Intentions Multimodal Learning |
Input: Visual scenarios and VLMs 输入:视觉场景和视觉语言模型 Step1: Develop open-ended question framework 开发开放式问题框架 Step2: Curate and annotate benchmark dataset 策划和注释基准数据集 Step3: Assess performance of VLMs 评估视觉语言模型的性能 Output: Evaluation results and insights 输出:评估结果和见解 |
| 8.5 | [8.5] 2503.22194 ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation [{'name': 'Yunhong Min, Daehyeon Choi, Kyeongmin Yeo, Jihyun Lee, Minhyuk Sung'}] |
Image Generation 图像生成 | v2 3D orientation grounding text-to-image generation |
Input: Text prompts and multi-view objects 文字提示和多视角对象 Step 1: 3D orientation estimation for multiple objects 多个对象的3D方向估计 Step 2: Reward-guided sampling using Langevin dynamics 奖励引导采样使用Langevin动力学 Step 3: Model evaluation and comparison with existing methods 模型评估与现有方法对比 Output: 3D orientated images 3D定向图像 |
| 8.5 | [8.5] 2503.22201 Multi-modal Knowledge Distillation-based Human Trajectory Forecasting [{'name': 'Jaewoo Jeong, Seohee Lee, Daehee Park, Giwon Lee, Kuk-Jin Yoon'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 trajectory forecasting knowledge distillation autonomous driving multi-modal systems |
Input: Limited modality student model 受限的模态学生模型 Step1: Train teacher model with full modalities 训练全模态教师模型 Step2: Distill knowledge to student model 从教师模型向学生模型蒸馏知识 Step3: Validate with datasets 验证数据集 Output: Enhanced prediction accuracy 改进的预测精度 |
| 8.5 | [8.5] 2503.22209 Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces [{'name': 'Wonhyeok Choi, Kyumin Hwang, Minwoo Choi, Kiljoon Han, Wonjoon Choi, Mingyu Shin, Sunghoon Im'}] |
Depth Estimation 深度估计 | v2 monocular depth estimation intrinsic image decomposition self-supervised learning |
Input: Sequential images 序列图像 Step1: Data integration 数据集成 Step2: Algorithm development 算法开发 Step3: Model training and evaluation 模型训练与评估 Output: Depth prediction and intrinsic images 深度预测与内在图像 |
| 8.5 | [8.5] 2503.22262 Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion [{'name': 'Songsong Yu, Yuxin Chen, Zhongang Qi, Zeke Xie, Yifan Wang, Lijun Wang, Ying Shan, Huchuan Lu'}] |
Multi-view and Stereo Vision 多视角立体 | v2 stereo conversion evaluation metric 3D content production |
Input: Monocular images 单眼图像 Step1: Dataset creation 数据集创建 Step2: Empirical evaluation 实证评估 Step3: New metric proposal 新指标提出 Output: Enhanced stereo conversion model 改进的立体转换模型 |
| 8.5 | [8.5] 2503.22309 A Dataset for Semantic Segmentation in the Presence of Unknowns [{'name': 'Zakaria Laskar, Tomas Vojir, Matej Grcic, Iaroslav Melekhov, Shankar Gangisettye, Juho Kannala, Jiri Matas, Giorgos Tolias, C. V. Jawahar'}] |
Image and Video Generation 图像生成 | v2 semantic segmentation autonomous driving anomaly detection |
Input: Real-world images from diverse environments 真实场景中的图像 Step 1: Dataset creation 数据集创建 Step 2: Labeling with closed-set and anomaly classes 标签闭集和异常类别 Step 3: Controlled evaluation 控制评估 Output: Comprehensive anomaly segmentation dataset 综合异常分割数据集 |
| 8.5 | [8.5] 2503.22375 Data Quality Matters: Quantifying Image Quality Impact on Machine Learning Performance [{'name': 'Christian Steinhauser, Philipp Reis, Hubert Padusinski, Jacob Langner, Eric Sax'}] |
Autonomous Driving 自动驾驶 | v2 image quality machine learning automotive perception object detection segmentation |
Input: Modified images from automotive datasets 经过修改的汽车数据集中的图像 Step1: Data preparation 数据准备 Step2: Quantification of image deviations 图像偏差的量化 Step3: Performance evaluation of ML models 机器学习模型性能评估 Step4: Correlation analysis of image quality and performance 图像质量与性能的相关性分析 Output: Insights into the impact of image quality on ML performance 输出:图像质量对机器学习性能的影响见解 |
| 8.5 | [8.5] 2503.22420 Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis [{'name': 'Jiangyong Huang, Baoxiong Jia, Yan Wang, Ziyu Zhu, Xiongkun Linghu, Qing Li, Song-Chun Zhu, Siyuan Huang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 3D vision-language benchmarking QA tasks |
Input: 3D vision-language models 3D视觉语言模型 Step1: Benchmark evaluation 基准评估 Step2: Object-centric testing 物体中心测试 Step3: Performance analysis 性能分析 Output: Comprehensive metrics for 3D-VL models 3D-VL模型的综合性能指标 |
| 8.5 | [8.5] 2503.22462 SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations [{'name': 'Krispin Wandel, Hesheng Wang'}] |
3D Object Recognition 物体识别 | v2 3D object-class representations semantic correspondence |
Input: RGB images and monocular depth estimates RGB图像和单目深度估计 Step1: Build 3D object-class representations from depth estimates 从深度估计构建3D物体类别表示 Step2: Formulate alignment energy using gradient descent 使用梯度下降公式化对齐能量 Step3: Minimize alignment energy to establish correspondence 最小化对齐能量以建立对应关系 Output: Robust semantic correspondence across varying views 输出:在变化视角下的鲁棒语义对应 |
| 8.5 | [8.5] 2503.22622 Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model [{'name': 'Jangho Park, Taesung Kwon, Jong Chul Ye'}] |
Image and Video Generation 图像生成与视频生成 | v2 4D video generation video diffusion models spatio-temporal consistency |
Input: Single monocular video 单个单目视频 Step1: Synthesize edge frames using video diffusion model 使用视频扩散模型合成边缘帧 Step2: Interpolate remaining frames to construct a coherent sampling grid 插值剩余帧以构建一致的采样网格 Output: Multi-view synchronized 4D video 生成多视角同步4D视频 |
Arxiv 2025-03-28
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.21082 Can Video Diffusion Model Reconstruct 4D Geometry? [{'name': 'Jinjie Mai, Wenxuan Zhu, Haozhe Liu, Bing Li, Cheng Zheng, J\"urgen Schmidhuber, Bernard Ghanem'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D geometry reconstruction 4D几何重建 video diffusion model 视频扩散模型 |
Input: Monocular video 单目视频 Step1: Adapt a pointmap VAE from a pretrained video VAE 从预训练视频VAE适应一个点图VAE Step2: Finetune a diffusion backbone in combined video and pointmap latent space 在结合视频和点图潜在空间中微调扩散骨干 Output: Coherent 4D pointmaps 统一的4D点图 |
| 9.5 | [9.5] 2503.21104 StyledStreets: Multi-style Street Simulator with Spatial and Temporal Consistency [{'name': 'Yuyin Chen, Yida Wang, Xueyang Zhang, Kun Zhan, Peng Jia, Yifei Zhan, Xianpeng Lang'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction 三维重建 autonomous driving 自动驾驶 urban simulation 城市模拟 |
Input: Street scenes with multi-camera setups 街景与多摄像头设置 Step1: Pose optimization for cameras 摄像头姿态优化 Step2: Hybrid embedding for scene and style separation 场景与风格分离的混合嵌入 Step3: Uncertainty-aware rendering for consistent output 不确定性感知渲染以确保一致性 Output: Photo-realistic urban scenes with invariant geometry 输出: 保持几何不变的照片真实感城市场景 |
| 9.5 | [9.5] 2503.21214 VoxRep: Enhancing 3D Spatial Understanding in 2D Vision-Language Models via Voxel Representation [{'name': 'Alan Dao (Gia Tuan Dao), Norapat Buppodom'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D understanding 3D理解 Voxel representation 体素表示 Vision-Language Models 视觉语言模型 |
Input: 3D voxel grid 3D体素网格 Step1: Decompose into 2D slices 沿主要轴切分成2D切片 Step2: Format and feed into VLM 输入格式化并送入视觉语言模型 Step3: Aggregate and interpret features 聚合并解释特征 Output: Structured voxel semantics 输出结构化的体素语义 |
| 9.5 | [9.5] 2503.21219 GenFusion: Closing the Loop between Reconstruction and Generation via Videos [{'name': 'Sibo Wu, Congrong Xu, Binbin Huang, Andreas Geiger, Anpei Chen'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction video generation novel view synthesis |
Input: RGB-D videos RGB-D视频 Step1: Fine-tune video model 视频模型微调 Step2: Masked 3D reconstruction 3D重建遮罩处理 Step3: Cyclic fusion pipeline 循环融合流程 Output: Artifact-free 3D models 无伪影的三维模型 |
| 9.5 | [9.5] 2503.21226 Frequency-Aware Gaussian Splatting Decomposition [{'name': 'Yishai Lavi, Leo Segre, Shai Avidan'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting frequency decomposition 3D editing view synthesis |
Input: Images for Gaussian Splatting 输入图像用于高斯点云处理 Step1: Group 3D Gaussians based on frequency subbands 按照频率子带分组3D高斯 Step2: Apply dedicated regularization to maintain coherence 应用特殊正则化以保持一致性 Step3: Implement a progressive training scheme for optimization 实施渐进培训方案以优化 Output: Frequency-aware 3D representation with enhanced editing capabilities 输出: 增强编辑能力的频率感知3D表示 |
| 9.5 | [9.5] 2503.21313 HORT: Monocular Hand-held Objects Reconstruction with Transformers [{'name': 'Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen, Cordelia Schmid'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction dense point clouds transformers |
Input: Monocular images 单目图像 Step1: Generate sparse point cloud 生成稀疏点云 Step2: Refine to dense representation 精炼到密集表示 Step3: Jointly predict object point cloud and pose 共同预测物体点云和姿态 Output: High-resolution 3D point clouds 输出高分辨率3D点云 |
| 9.5 | [9.5] 2503.21364 LandMarkSystem Technical Report [{'name': 'Zhenxiang Ma, Zhenyu Yang, Miao Tao, Yuanzhen Zhou, Zeyu He, Yuchang Zhang, Rong Fu, Hengjie Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Neural Radiance Fields 3D Gaussian Splatting autonomous driving |
Input: Multi-view images 多视角图像 Step1: Componentized model adaptation 组件化模型自适应 Step2: Distributed parallel computing 分布式并行计算 Step3: Dynamic loading strategy 动态加载策略 Output: Enhanced 3D reconstruction and rendering 改进的三维重建与渲染 |
| 9.5 | [9.5] 2503.21449 Towards Generating Realistic 3D Semantic Training Data for Autonomous Driving [{'name': 'Lucas Nunes, Rodrigo Marcuzzi, Jens Behley, Cyrill Stachniss'}] |
3D Generation 三维生成 | v2 3D semantic generation data annotation autonomous driving |
Input: Semantic scene data 语义场景数据 Step1: Train a diffusion model 训练扩散模型 Step2: Generate realistic 3D semantic scenes 生成真实的3D语义场景 Step3: Evaluate synthetic data for training 评估合成数据的训练效果 Output: Improved semantic segmentation performance 改进的语义分割性能 |
| 9.5 | [9.5] 2503.21525 ICG-MVSNet: Learning Intra-view and Cross-view Relationships for Guidance in Multi-View Stereo [{'name': 'Yuxi Hu, Jun Zhang, Zhe Zhang, Rafael Weilharter, Yuchen Rao, Kuangyi Chen, Runze Yuan, Friedrich Fraundorfer'}] |
Multi-view Stereo 多视角立体 | v2 Multi-view Stereo 3D reconstruction depth estimation |
Input: Series of overlapping images 重叠的图像序列 Step1: Feature extraction 特征提取 Step2: Intra-view feature fusion intra-view特征融合 Step3: Cross-view aggregation cross-view聚合 Step4: Depth estimation 深度估计 Output: 3D point cloud 3D点云 |
| 9.5 | [9.5] 2503.21581 AlignDiff: Learning Physically-Grounded Camera Alignment via Diffusion [{'name': 'Liuyue Xie, Jiancong Guo, Ozan Cakmakci, Andre Araujo, Laszlo A. Jeni, Zhiheng Jia'}] |
3D Perception and Calibration 三维感知与标定 | v2 camera calibration 3D perception diffusion model |
Input: Video sequences 视频序列 Step1: Condition diffusion model with line embeddings 利用线嵌入条件化扩散模型 Step2: Edge-aware attention focuses on geometric features 边缘关注观点强调几何特征 Step3: Joint estimation of intrinsic and extrinsic parameters 同时估计内外参数 Output: Accurate camera calibration outputs 准确的相机标定输出 |
| 9.5 | [9.5] 2503.21659 InteractionMap: Improving Online Vectorized HDMap Construction with Interaction [{'name': 'Kuang Wu, Chuan Yang, Zhanbin Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 HD maps autonomous driving map vectorization |
Input: High-definition map data 高精度地图数据 Step1: Enhance detectors using position relation embedding 增强检测器的位置信息嵌入 Step2: Key-frame-based hierarchical temporal fusion 模块 关键帧基础的分层时间融合 Step3: Introduce geometry-aware classification loss 引入几何感知分类损失 Output: Improved vectorized HD map outputs 改进的矢量化高清地图输出 |
| 9.5 | [9.5] 2503.21692 RapidPoseTriangulation: Multi-view Multi-person Whole-body Human Pose Triangulation in a Millisecond [{'name': 'Daniel Bermuth, Alexander Poeppel, Wolfgang Reif'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction pose estimation multi-view |
Input: Multi-view images and 2D poses 多视角图像和2D姿势 Step1: Predict 2D poses for each image 预测每幅图像的2D姿势 Step2: Filter pairs of poses using previous 3D poses 使用先前3D姿势筛选姿势对 Step3: Triangulate to create 3D proposals 三角测量生成3D提案 Step4: Reproject and evaluate reprojection error 重新投影并评估重投影误差 Output: Accurate 3D human poses 准确的3D人类姿势 |
| 9.5 | [9.5] 2503.21732 SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling [{'name': 'Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction mesh modeling high-resolution shapes |
Input: Sparse-structured isosurface representation 稀疏结构的等值面表示 Step1: Frustum-aware sectional voxel training 分段体素训练 Step2: Differentiable mesh reconstruction 可微分网格重建 Step3: Shape modeling pipeline construction 形状建模管道构建 Output: High-resolution 3D models 高分辨率三维模型 |
| 9.5 | [9.5] 2503.21745 3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models [{'name': 'Yuhan Zhang, Mengchen Zhang, Tong Wu, Tengfei Wang, Gordon Wetzstein, Dahua Lin, Ziwei Liu'}] |
3D Generation 三维生成 | v2 3D evaluation 3D评估 3D generation 3D生成 human preference 人类偏好 |
Input: Text and image prompts 文本和图像提示 Step1: Develop 3DGen-Arena platform 开发3DGen-Arena平台 Step2: Gather human preferences 收集人类偏好 Step3: Train scoring models 训练评分模型 Output: 3DGen-Bench dataset 生成3DGen-Bench数据集 |
| 9.5 | [9.5] 2503.21761 Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video [{'name': 'David Yifan Yao, Albert J. Zhai, Shenlong Wang'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 4D modeling 3D reconstruction dynamic scenes optimization |
Input: Casual video inputs 从普通视频输入 Step1: Multi-stage optimization framework 多阶段优化框架 Step2: Integration of pretrained models 集成预训练模型 Step3: Estimation of camera poses, static and dynamic geometry and motion 相机姿态、静态和动态几何与运动的估计 Output: Accurate 4D scene models 生成准确的4D场景模型 |
| 9.5 | [9.5] 2503.21766 Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence [{'name': 'Haolin Liu, Xiaohang Zhan, Zizheng Yan, Zhongjin Luo, Yuxin Wen, Xiaoguang Han'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Shape Correspondence Registration-based Framework Neural Rendering |
Input: Source and target mesh 源网格和目标网格 Step1: Registration of source mesh to target mesh 源网格注册到目标网格 Step2: Establish dense correspondence between shapes 建立形状间的稠密对应关系 Step3: Apply Semantic Flow Guided Registration application 使用语义流引导注册 Output: Stable dense correspondence output 稳定的稠密对应输出 |
| 9.5 | [9.5] 2503.21767 Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying [{'name': 'Hairong Yin, Huangying Zhan, Yi Xu, Raymond A. Yeh'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting open-vocabulary querying point-level querying |
Input: 3D Gaussian representation 3D高斯表示 Step1: Utilize masklets for ground-truth generation 利用masklet生成基准真相 Step2: Implement a two-step querying process 实现两步查询过程 Output: Retrieved relevant 3D Gaussians 相关3D高斯的提取 |
| 9.5 | [9.5] 2503.21778 HS-SLAM: Hybrid Representation with Structural Supervision for Improved Dense SLAM [{'name': 'Ziren Gong, Fabio Tosi, Youmin Zhang, Stefano Mattoccia, Matteo Poggi'}] |
Simultaneous Localization and Mapping (SLAM) 同时定位与地图构建 | v2 Dense SLAM 3D reconstruction Structural Supervision |
Input: RGB-D data with potential structure scenes RGB-D 数据与潜在结构场景 Step1: Hybrid encoding network to enhance scene representation 集成编码网络以增强场景表示 Step2: Structural supervision for scene understanding 结构监督以理解场景 Step3: Active global bundle adjustment for consistency 激活全局束调整以确保一致性 Output: Accurate dense maps with improved tracking and reconstruction 准确的密集地图及改进的跟踪与重建 |
| 9.2 | [9.2] 2503.21751 Reconstructing Humans with a Biomechanically Accurate Skeleton [{'name': 'Yan Xia, Xiaowei Zhou, Etienne Vouga, Qixing Huang, Georgios Pavlakos'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction biomechanical skeleton transformer |
Input: Single image 单幅图像 Step1: Generate pseudo ground truth 生成伪真实数据 Step2: Train transformer to estimate parameters 训练变换器以估计参数 Step3: Iterative refinement of pseudo labels 伪标签的迭代优化 Output: 3D human reconstruction 3D人体重建 |
| 8.5 | [8.5] 2503.20936 LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos [{'name': 'Daniel Etaat, Dvij Kalaria, Nima Rahmanian, Shankar Sastry'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction anticipatory control table tennis robotics |
Input: Monocular videos 单目视频 Step1: Data integration 数据集成 Step2: 3D reconstruction 3D重建 Step3: Anticipatory control algorithm 开发预测控制算法 Output: Enhanced ball return rate 改进的击球回报率 |
| 8.5 | [8.5] 2503.21099 Learning Class Prototypes for Unified Sparse Supervised 3D Object Detection [{'name': 'Yun Zhu, Le Hui, Hang Yang, Jianjun Qian, Jin Xie, Jian Yang'}] |
3D Object Detection 3D目标检测 | v2 3D object detection sparse supervision prototypes indoor and outdoor scenes |
Input: Sparse supervised 3D object detection data 稀疏监督3D目标检测数据 Step1: Prototype-based object mining module 原型基础的对象挖掘模块 Step2: Optimal transport matching 最优传输匹配 Step3: Multi-label cooperative refinement module 多标签协同精练模块 Output: Enhanced detection performance 改进的检测性能 |
| 8.5 | [8.5] 2503.21268 ClimbingCap: Multi-Modal Dataset and Method for Rock Climbing in World Coordinate [{'name': 'Ming Yan, Xincheng Lin, Yuhua Luo, Shuqi Fan, Yudi Dai, Qixin Zhong, Lincai Zhong, Yuexin Ma, Lan Xu, Chenglu Wen, Siqi Shen, Cheng Wang'}] |
Human Motion Recovery 人体运动恢复 | v2 3D reconstruction human motion recovery autonomous driving |
Input: RGB and LiDAR data Step1: Collecting and annotating climbing motion data Step2: Developing ClimbingCap method for motion reconstruction Step3: Evaluating performance on climbing motion recovery Output: Continuous 3D human climbing motion in global coordinates |
| 8.5 | [8.5] 2503.21338 UGNA-VPR: A Novel Training Paradigm for Visual Place Recognition Based on Uncertainty-Guided NeRF Augmentation [{'name': 'Yehui Shen, Lei Zhang, Qingqiu Li, Xiongwei Zhao, Yue Wang, Huimin Lu, Xieyuanli Chen'}] |
Visual Place Recognition 视觉地点识别 | v2 Visual Place Recognition NeRF Data Augmentation Autonomous Navigation 3D reconstruction |
Input: Existing VPR dataset 现有VPR数据集 Step1: Train NeRF using existing VPR data 使用现有VPR数据训练NeRF Step2: Identify high uncertainty places using uncertainty estimation network 使用不确定性估计网络识别高不确定性的位置 Step3: Generate synthetic observations with selected poses through NeRF 通过NeRF生成选定姿态的合成观测 Output: Enhanced VPR training data 改进的VPR训练数据 |
| 8.5 | [8.5] 2503.21477 Fine-Grained Behavior and Lane Constraints Guided Trajectory Prediction Method [{'name': 'Wenyi Xiong, Jian Chen, Ziheng Qi'}] |
Autonomous Systems and Robotics 自主系统与机器人 | trajectory prediction 轨迹预测 autonomous driving 自动驾驶 lane constraints 车道约束 |
Input: Trajectory data 轨迹数据 Step1: Behavioral intention recognition 行为意图识别 Step2: Lane constraint modeling 车道约束建模 Step3: Dual-stream architecture integration 双流架构集成 Step4: Trajectory proposal generation 轨迹提议生成 Step5: Point-level refinement 点级细化 Output: Fine-grained trajectory predictions 精细化轨迹预测 |
| 8.5 | [8.5] 2503.21562 uLayout: Unified Room Layout Estimation for Perspective and Panoramic Images [{'name': 'Jonathan Lee, Bolivar Solarte, Chin-Hsuan Wu, Jin-Cheng Jhang, Fu-En Wang, Yi-Hsuan Tsai, Min Sun'}] |
3D Reconstruction and Modeling 三维重建 | v2 room layout estimation 3D reconstruction panoramic images |
Input: Perspective and panoramic images 透视和全景图像 Step1: Project input images into equirectangular coordinates 将输入图像投影到等经纬度坐标 Step2: Use shared feature extractor with domain-specific conditioning 使用共享特征提取器并进行领域特定条件处理 Step3: Apply column-wise feature regression 应用列-wise 特征回归 Output: Estimated room layout geometries 估计的房间布局几何 |
| 8.5 | [8.5] 2503.21723 OccRobNet : Occlusion Robust Network for Accurate 3D Interacting Hand-Object Pose Estimation [{'name': 'Mallika Garg, Debashis Ghosh, Pyari Mohan Pradhan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D hand pose estimation occlusion autonomous systems CNN transformer |
Input: RGB image RGB图像 Step1: Localizing hand joints using CNN 定位手关节采用CNN Step2: Refining joint estimates using contextual information 使用上下文信息细化关节估计 Step3: Identifying joints with self-attention and cross-attention mechanisms 使用自注意力和交叉注意力机制识别关节 Output: Accurate 3D hand-object pose estimates 输出: 精确的3D手-物体姿态估计 |
| 8.5 | [8.5] 2503.21755 VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness [{'name': 'Dian Zheng, Ziqi Huang, Hongbo Liu, Kai Zou, Yinan He, Fan Zhang, Yuanhan Zhang, Jingwen He, Wei-Shi Zheng, Yu Qiao, Ziwei Liu'}] |
Image and Video Generation 图像生成和视频生成 | v2 Video Generation 视频生成 Intrinsic Faithfulness 内在真实 |
Input: Video generative models 视频生成模型 Step1: Establish evaluation metrics 建立评估指标 Step2: Benchmark development 基准开发 Step3: Model assessment 模型评估 Output: Intrinsically faithful video generation outputs 本质真实的视频生成结果 |
| 8.5 | [8.5] 2503.21779 X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction [{'name': 'Weihao Yu, Yuanhao Cai, Ruyi Zha, Zhiwen Fan, Chenxin Li, Yixuan Yuan'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D CT reconstruction Gaussian splatting dynamic imaging respiratory motion learning |
Input: Projections of dynamic anatomical structures 3D动态解剖结构的投影 Step1: Model continuous anatomical motion 建模连续解剖运动 Step2: Apply radiative Gaussian splatting 应用辐射高斯点云 Step3: Implement self-supervised learning 实现自监督学习 Output: 4D CT reconstruction of continuous motion 输出:连续运动的4D CT重建 |
| 7.5 | [7.5] 2503.21483 BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding [{'name': 'Shuming Liu, Chen Zhao, Tianqi Xu, Bernard Ghanem'}] |
VLM & VLA 视觉语言模型与对齐 | v2 Video-Language Models frame selection video understanding |
Input: Long-form videos 长视频 Step1: Frame selection strategy evaluation 帧选择策略评估 Step2: Implementation of inverse transform sampling 逆变换采样的实现 Step3: Performance assessment on video benchmarks 视频基准上的性能评估 Output: Improved video understanding performance 提升的视频理解性能 |
Arxiv 2025-03-27
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.20168 EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [{'name': 'Sheng Miao, Jiaxin Huang, Dongfeng Bai, Xu Yan, Hongyu Zhou, Yue Wang, Bingbing Liu, Andreas Geiger, Yiyi Liao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting autonomous driving real-time rendering |
Input: Multiple sparse images 多张稀疏图像 Step 1: Initialize noisy depth predictions 初始化噪声深度预测 Step 2: Process point cloud with 3D CNN 使用3D卷积神经网络处理点云 Step 3: Predict 3D Gaussian properties 预测3D高斯属性 Output: Real-time rendering of urban scenes 实时渲染城市场景 |
| 9.5 | [9.5] 2503.20211 Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors [{'name': 'Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, Robby T. Tan'}] |
Depth Estimation 深度估计 | v2 Depth Estimation 深度估计 Autonomous Driving 自动驾驶 Robustness 可靠性 |
Input: Monocular images 单目图像 Step1: Synthetic adaptation with motion structure knowledge 合成适应与运动结构知识 Step2: Real adaptation with consistency-reweighting strategy 实际适应与一致性加权策略 Step3: Depth estimation model training 深度估计模型训练 Output: Robust depth predictions 可靠的深度预测 |
| 9.5 | [9.5] 2503.20220 DINeMo: Learning Neural Mesh Models with no 3D Annotations [{'name': 'Weijie Guo, Guofeng Zhang, Wufei Ma, Alan Yuille'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D pose estimation neural mesh models unlabeled data autonomous systems robotics |
Input: Images of objects without 3D annotations 无3D标注的物体图像 Step1: Generate pseudo-correspondence 生成伪对应关系 Step2: Train neural mesh model using pseudo labels 使用伪标签训练神经网格模型 Step3: Evaluate performance on 3D pose estimation 在3D姿态估计上评估性能 Output: Accurate 3D pose estimates 准确的3D姿态估计 |
| 9.5 | [9.5] 2503.20221 TC-GS: Tri-plane based compression for 3D Gaussian Splatting [{'name': 'Taorui Wang, Zitong Yu, Yong Xu'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting Compression Tri-plane |
Input: Unorganized 3D Gaussian attributes 非结构化的3D高斯属性 Step1: Tri-plane encoding of attributes 三平面编码属性 Step2: KNN-based decoding for Gaussian distribution KNN解码高斯分布 Step3: Adaptive wavelet loss for high-frequency details 自适应小波损失处理高频细节 Output: Compressed 3D Gaussian representation 压缩后的3D高斯表示 |
| 9.5 | [9.5] 2503.20519 MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation [{'name': 'Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee'}] |
3D Generation 三维生成 | v2 3D generation masked auto-regressive transformer |
Input: 3D data 3D数据 Step1: Pyramid VAE architecture development Pyramid VAE架构开发 Step2: Cascaded MAR generation implementation 级联MAR生成实现 Step3: Training with random masking and auto-regressive denoising 随机掩蔽和自回归去噪训练 Output: High-resolution 3D meshes 高分辨率3D网格 |
| 9.5 | [9.5] 2503.20523 GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving [{'name': 'Lloyd Russell, Anthony Hu, Lorenzo Bertoni, George Fedoseev, Jamie Shotton, Elahe Arani, Gianluca Corrado'}] |
Generative Models for Autonomous Driving 自动驾驶的生成模型 | v2 3D modeling autonomous driving scene simulation generative models |
Input: Structured conditioning parameters 结构化条件参数 Step1: Multi-camera video generation 多摄像头视频生成 Step2: Conditioning on driving scenarios 驾驶场景条件化 Step3: Fine-grained control over agent behavior 代理行为的细粒度控制 Output: High-resolution, temporally consistent videos 高分辨率、时间一致性的视频 |
| 9.5 | [9.5] 2503.20784 FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks [{'name': 'Jinwei Li, Huan-ang Gao, Wenyi Li, Haohan Chi, Chenyu Liu, Chenxi Du, Yiqian Liu, Mingju Gao, Guiyu Zhang, Zongzheng Zhang, Li Yi, Yao Yao, Jingwei Zhao, Hongyang Li, Yikai Wang, Hao Zhao'}] |
3D Generation 三维生成 | v2 4D generation dynamic content generation feature bank spatial-temporal consistency multi-view generation |
Input: Multi-view and frame sequences 多视角和帧序列 Step1: Feature extraction 特征提取 Step2: Feature bank integration 特征库集成 Step3: Temporal generation algorithm generation 时间生成算法生成 Step4: Model evaluation 模型评估 Output: Coherent dynamic 3D content 连贯的动态3D内容 |
| 9.2 | [9.2] 2503.19947 Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders [{'name': 'Paul Koch, J\"org Kr\"uger, Ankit Chowdhury, Oliver Heimann'}] |
Depth Estimation 深度估计 | v2 depth understanding vision-guided robotics self-supervised learning |
Input: RGB encoders with depth information RGB编码器与深度信息 Step1: Self-supervised training pipeline 自监督训练管道 Step2: Depth feature extraction 深度特征提取 Step3: Performance evaluation 性能评估 Output: Enhanced RGBD encoder 改进的RGBD编码器 |
| 9.0 | [9.0] 2503.20654 AccidentSim: Generating Physically Realistic Vehicle Collision Videos from Real-World Accident Reports [{'name': 'Xiangwen Zhang, Qian Zhang, Longfei Han, Qiang Qu, Xiaoming Chen'}] |
Autonomous Driving 自动驾驶 | v2 3D reconstruction autonomous driving vehicle collision |
Input: Real-world accident reports 从真实事故报告中获取信息 Step1: Extract physical clues from reports 从报告中提取物理线索 Step2: Use physical simulator to replicate trajectories 使用物理模拟器生成碰撞轨迹 Step3: Fine-tune language model for scenario predictions 细调语言模型以预测场景 Output: Physically realistic vehicle collision videos 生成物理真实感的车辆碰撞视频 |
| 8.5 | [8.5] 2503.19953 Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals [{'name': 'Stefan Stojanov, David Wendt, Seungwoo Kim, Rahul Venkatesh, Kevin Feigelis, Jiajun Wu, Daniel LK Yamins'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人技术 | v2 motion estimation self-supervised learning optical flow |
Input: Video data 视频数据 Step1: Flow and occlusion estimation 流动和遮挡估计 Step2: Optimize counterfactual probes 优化反事实探针 Step3: Model evaluation 模型评估 Output: Motion estimates 运动估计 |
| 8.5 | [8.5] 2503.20011 Hyperdimensional Uncertainty Quantification for Multimodal Uncertainty Fusion in Autonomous Vehicles Perception [{'name': 'Luke Chen, Junyao Wang, Trier Mortlock, Pramod Khargonekar, Mohammad Abdullah Al Faruque'}] |
Autonomous Systems and Robotics 自动驾驶和机器人 | v2 Uncertainty Quantification Autonomous Vehicles Multimodal Fusion 3D Object Detection |
Input: Multimodal sensor inputs 多模态传感器输入 Step1: Feature extraction 特征提取 Step2: Uncertainty quantification 不确定性量化 Step3: Feature fusion 特征融合 Output: Enhanced perception and detection 改进的感知与检测 |
| 8.5 | [8.5] 2503.20235 Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection [{'name': 'Ahyun Seo, Minsu Cho'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D symmetry detection 3D对称性检测 geometric priors 几何先验 |
Input: 2D images 2D图像 Step1: Predict rotation centers in 3D space 在3D空间中预测旋转中心 Step2: Vertex reconstruction enforcing 3D geometric priors 强制执行3D几何先验的顶点重建 Step3: Project results back to 2D 将结果投影回2D Output: Detected rotation symmetry with enhanced accuracy 检测到的旋转对称性,具有更高的准确性 |
| 8.5 | [8.5] 2503.20268 EGVD: Event-Guided Video Diffusion Model for Physically Realistic Large-Motion Frame Interpolation [{'name': 'Ziran Zhang, Xiaohui Li, Yihao Liu, Yujin Wang, Yueting Chen, Tianfan Xue, Shi Guo'}] |
Image and Video Generation 图像生成与视频生成 | v2 video frame interpolation event cameras diffusion models |
Input: Low-frame-rate RGB frames and event signals Step1: Develop Multi-Modal Motion Condition Generator (MMCG) to integrate motion clues Step2: Fine-tune stable video diffusion (SVD) model with conditions from MMCG Step3: Evaluate generated frames for visual quality and fidelity Output: Physically realistic intermediate video frames |
| 8.5 | [8.5] 2503.20291 CryoSAMU: Enhancing 3D Cryo-EM Density Maps of Protein Structures at Intermediate Resolution with Structure-Aware Multimodal U-Nets [{'name': 'Chenwei Zhang, Anne Condon, Khanh Dao Duc'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D cryo-EM protein structure deep learning |
Input: 3D cryo-EM density maps 3D 冷冻电子显微镜密度图 Step1: Integrate structural information with map features 集成结构信息与图像特征 Step2: Train multimodal U-Net on curated datasets 训练多模态U-Net模型 Step3: Evaluate performance across various metrics 评估各类指标下的性能 Output: Enhanced cryo-EM maps 改进的冷冻电子显微镜图像 |
| 8.5 | [8.5] 2503.20321 Recovering Dynamic 3D Sketches from Videos [{'name': 'Jaeah Lee, Changwoon Choi, Young Min Kim, Jaesik Park'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction dynamic sketches motion analysis video-based 3D reconstruction |
Input: Video frames 视频帧 Step1: Extract 3D point cloud motion guidance 提取3D点云运动引导 Step2: Deform parametric 3D curves 变形参数化的3D曲线 Step3: Optimize motion guidance 优化运动引导 Output: Compact dynamic 3D sketches 输出紧凑的动态3D草图 |
| 8.5 | [8.5] 2503.20652 Imitating Radiological Scrolling: A Global-Local Attention Model for 3D Chest CT Volumes Multi-Label Anomaly Classification [{'name': 'Theo Di Piazza, Carole Lazarus, Olivier Nempont, Loic Boussel'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D CT scans anomaly classification global-local attention |
Input: 3D CT volumes 三维CT体积 Step1: Emulate scrolling behavior 模拟滚动行为 Step2: Global-local attention model development 全球-local注意力模型开发 Step3: Model evaluation on datasets 在数据集上评估模型 Output: Multi-label anomaly classification results 多标签异常分类结果 |
| 8.5 | [8.5] 2503.20663 ARMO: Autoregressive Rigging for Multi-Category Objects [{'name': 'Mingze Sun, Shiwei Mao, Keyi Chen, Yurun Chen, Shunlin Lu, Jingbo Wang, Junting Dong, Ruqi Huang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D modeling 三维建模 rigging 装配 autoregressive models 自回归模型 |
Input: 3D meshes 三维网格 Step1: Data integration 数据集成 Step2: Autoregressive model development 自回归模型开发 Step3: Skeleton prediction 骨骼预测 Output: Rigged 3D models 装配的三维模型 |
| 8.5 | [8.5] 2503.20682 GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection [{'name': 'Xingyu Peng, Si Liu, Chen Gao, Yan Bai, Beipeng Mu, Xiaofei Wang, Huaxia Xia'}] |
3D Open-Vocabulary Detection 3D开集检测 | v2 3D Open-Vocabulary Detection LiDAR point clouds |
Input: LiDAR point clouds LiDAR点云 Step1: Generate initial detection results 生成初步检测结果 Step2: Analyze scene context 分析场景上下文 Step3: Refine detection using common sense reasoning 精确利用常识推理修正检测结果 Step4: Apply balance schemes to improve class representation 应用平衡机制以改善类别表示 Output: Improved detection results with topic adaptability 输出: 改进的具有适应性的检测结果 |
| 8.5 | [8.5] 2503.20746 PhysGen3D: Crafting a Miniature Interactive World from a Single Image [{'name': 'Boyuan Chen, Hanxiao Jiang, Shaowei Liu, Saurabh Gupta, Yunzhu Li, Hao Zhao, Shenlong Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction interactive simulation video generation |
Input: Single image 单一图像 Step1: Estimate 3D shapes 估计三维形状 Step2: Compute physical and lighting properties 计算物理和光照属性 Step3: Generate interactive 3D scene 生成互动的三维场景 Output: Realistic video generation 真实视频生成 |
| 8.5 | [8.5] 2503.20776 Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields [{'name': 'Shijie Zhou, Hui Ren, Yijia Weng, Shuwang Zhang, Zhen Wang, Dejia Xu, Zhiwen Fan, Suya You, Zhangyang Wang, Leonidas Guibas, Achuta Kadambi'}] |
4D Representation and Reconstruction 4D 表示与重建 | v2 4D representation 4D表示 Gaussian Splatting 高斯点云 Monocular Video 单目视频 |
Input: Monocular video 单目视频 Step 1: Dynamic optimization 动态优化 Step 2: Gaussian feature field distillation 高斯特征场蒸馏 Step 3: 4D scene reconstruction 4D场景重建 Output: Interactive 4D agentic AI 交互式4D智能AI |
| 7.5 | [7.5] 2503.20314 Wan: Open and Advanced Large-Scale Video Generative Models [{'name': 'WanTeam, :, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, Jianyuan Zeng, Jiayu Wang, Jingfeng Zhang, Jingren Zhou, Jinkai Wang, Jixuan Chen, Kai Zhu, Kang Zhao, Keyu Yan, Lianghua Huang, Mengyang Feng, Ningyi Zhang, Pandeng Li, Pingyu Wu, Ruihang Chu, Ruili Feng, Shiwei Zhang, Siyang Sun, Tao Fang, Tianxing Wang, Tianyi Gui, Tingyu Weng, Tong Shen, Wei Lin, Wei Wang, Wei Wang, Wenmeng Zhou, Wente Wang, Wenting Shen, Wenyuan Yu, Xianzhong Shi, Xiaoming Huang, Xin Xu, Yan Kou, Yangyu Lv, Yifei Li, Yijing Liu, Yiming Wang, Yingya Zhang, Yitong Huang, Yong Li, You Wu, Yu Liu, Yulin Pan, Yun Zheng, Yuntao Hong, Yupeng Shi, Yutong Feng, Zeyinzi Jiang, Zhen Han, Zhi-Fan Wu, Ziyu Liu'}] |
Image and Video Generation 图像与视频生成 | v2 video generation generative models diffusion models |
Input: Large-scale images and videos 大规模图像和视频 Step1: Data curation 数据整理 Step2: Model design and optimization 模型设计与优化 Step3: Benchmarking and evaluation 基准测试与评估 Output: Advanced video generative models 高级视频生成模型 |
Arxiv 2025-03-26
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.19332 Divide-and-Conquer: Dual-Hierarchical Optimization for Semantic 4D Gaussian Spatting [{'name': 'Zhiying Yan, Yiyuan Liang, Shilv Cai, Tao Zhang, Sheng Zhong, Luxin Yan, Xu Zou'}] |
3D Reconstruction and Modeling 三维重建 | v2 Dynamic Scene Reconstruction Gaussian Splatting |
Input: Dynamic scenes 动态场景 Step1: Data separation 数据分离 Step2: Hierarchical optimization 分层优化 Step3: Gaussian management 高斯管理 Output: Enhanced dynamic scene understanding 改进的动态场景理解 |
| 9.5 | [9.5] 2503.19340 BADGR: Bundle Adjustment Diffusion Conditioned by GRadients for Wide-Baseline Floor Plan Reconstruction [{'name': 'Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu, Linda Shapiro, Alex Colburn'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction bundle adjustment RGB panorama layout generation |
Input: Wide-baseline RGB panoramas 宽基线RGB全景图 Step1: Camera pose and floor plan initialization 相机位姿和平面布局初始化 Step2: Bundle adjustment and refinement 捆绑调整与优化 Step3: Integration of layout-structural constraints 布局结构约束的整合 Output: Accurate camera poses and floor plans 准确的相机位姿和楼层平面图 |
| 9.5 | [9.5] 2503.19373 DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image [{'name': 'Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction human body reconstruction cloth modeling |
Input: Single image 单幅图像 Step1: Utilize 3D template models for regularization 利用三维模板模型进行正则化 Step2: Develop a specialized cloth diffusion model 开发专门的布料扩散模型 Step3: Reconstruct 3D cloth and human body based on templates 基于模板重建三维布料和人体 Output: Decomposed 3D model of cloth and human body 输出:分解的三维布料和人体模型 |
| 9.5 | [9.5] 2503.19443 COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting [{'name': 'Jiaxin Zhang, Junjun Jiang, Youyu Chen, Kui Jiang, Xianming Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D segmentation Gaussian splatting visual quality scene understanding object boundaries |
Input: Multi-view images 多视角图像 Step1: Joint optimization of semantics and visual information 联合优化语义与视觉信息 Step2: Boundary-adaptive Gaussian splitting technique 边界自适应高斯分裂技术 Step3: Texture restoration for visual quality texture restoration 视觉质量的纹理恢复 Output: Improved segmentation accuracy and clear boundaries 改进的分割精度和清晰边界 |
| 9.5 | [9.5] 2503.19448 Towards Robust Time-of-Flight Depth Denoising with Confidence-Aware Diffusion Model [{'name': 'Changyong He, Jin Zeng, Jiawei Zhang, Jiajie Guo'}] |
Depth Estimation 深度估计 | v2 Depth Denoising Time-of-Flight Diffusion Models 3D Reconstruction |
Input: Raw correlation measurements from ToF sensors 从时间飞行传感器的原始相关测量开始 Step1: Dynamic range normalization 动态范围归一化 Step2: Apply diffusion model in denoising 应用扩散模型进行去噪 Step3: Confidence-aware guidance integration 集成基于置信度的指导 Output: Enhanced depth maps 改进的深度图 |
| 9.5 | [9.5] 2503.19452 SparseGS-W: Sparse-View 3D Gaussian Splatting in the Wild with Generative Priors [{'name': 'Yiqing Li, Xuan Wang, Jiawei Wu, Yikun Ma, Zhi Jin'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting few-shot learning novel view synthesis occlusion handling |
Input: Unconstrained in-the-wild images from various views 来自不同视角的野外图像 Step1: Multi-view stereo for camera parameters multi-view立体视觉技术获取相机参数 Step2: Gaussian optimization with Constrained Novel-View Enhancement 高斯优化与约束新视角增强模块结合 Step3: Occlusion handling to improve view consistency 处理遮挡以提高视角一致性 Output: High-quality novel views of the scene 该场景的高质量新视角 |
| 9.5 | [9.5] 2503.19458 GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting [{'name': 'Shujuan Li, Yu-Shen Liu, Zhizhong Han'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction unsigned distance functions multi-view images 3D Gaussian Splatting |
Input: Multi-view images 多视角图像 Step1: Overfit 2D Gaussian planes on surfaces 在表面上过拟合2D高斯平面 Step2: Use self-supervision and gradient-based inference for UDF supervision 利用自监督和基于梯度的推理进行UDF监督 Step3: Produce continuous UDF representations 生成连续的UDF表示 Output: Accurate reconstruction of open surfaces 精确重建开放表面 |
| 9.5 | [9.5] 2503.19543 Scene-agnostic Pose Regression for Visual Localization [{'name': 'Junwei Zheng, Ruiping Liu, Yufan Chen, Zhenfang Chen, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen'}] |
Visual Odometry 视觉里程计 | v2 Pose Regression 姿态回归 Visual Localization 视觉定位 Camera Poses 相机姿态 |
Input: Sequence of images along a trajectory 图像序列沿着轨迹 Step1: Model input preparation 模型输入准备 Step2: Pose prediction 相机姿态预测 Step3: Evaluation of pose accuracy 姿态精度评估 Output: Predictions of 6D camera poses 6D相机姿态预测 |
| 9.5 | [9.5] 2503.19703 High-Quality Spatial Reconstruction and Orthoimage Generation Using Efficient 2D Gaussian Splatting [{'name': 'Qian Wang, Zhihao Zhan, Jialei He, Zhituo Tu, Xiang Zhu, Jie Yuan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction digital orthophoto maps 2D Gaussian Splatting depth estimation |
Input: Multi-view images and terrain data 多视角图像及地形数据 Step1: Generate depth maps 生成深度图 Step2: Apply 2D Gaussian Splatting method 应用2D高斯点云方法 Step3: Render True Digital Orthophoto Maps (TDOMs) 渲染真正数字正交影像图(TDOMs) Output: High-quality spatial reconstruction 高质量空间重建 |
| 9.5 | [9.5] 2503.19776 Resilient Sensor Fusion under Adverse Sensor Failures via Multi-Modal Expert Fusion [{'name': 'Konyul Park, Yecheol Kim, Daehun Kim, Jun Won Choi'}] |
Autonomous Systems and Robotics 自动驾驶与机器人 | v2 LiDAR camera sensor fusion 3D object detection autonomous driving |
Input: Multi-modal sensor data 多模态传感器数据 Step1: Integration of LiDAR and camera features LiDAR与相机特征的集成 Step2: Development of Multi-Expert Decoding framework 多专家解码框架的开发 Step3: Performance evaluation on benchmark 数据集上的性能评估 Output: Robust 3D object detection results 稳健的三维物体检测结果 |
| 9.5 | [9.5] 2503.19912 SuperFlow++: Enhanced Spatiotemporal Consistency for Cross-Modal Data Pretraining [{'name': 'Xiang Xu, Lingdong Kong, Hui Shuai, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Qingshan Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 LiDAR representation learning autonomous driving 3D perception spatiotemporal consistency |
Input: Consecutive LiDAR-camera pairs LiDAR-相机配对 Step1: View consistency alignment view 一致性对齐 Step2: Dense-to-sparse consistency regularization 密集到稀疏一致性正则化 Step3: Flow-based contrastive learning 基于流的对比学习 Step4: Temporal voting strategy 时间投票策略 Output: Enhanced LiDAR-based perception 改进的基于LiDAR的感知 |
| 9.5 | [9.5] 2503.19913 PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model [{'name': 'Mingju Gao, Yike Pan, Huan-ang Gao, Zongzheng Zhang, Wenyi Li, Hao Dong, Hao Tang, Li Yi, Hao Zhao'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 4D reconstruction 四维重建 part-level dynamics 部分级动态 autonomous robotics 自主机器人 |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: 4D reconstruction framework development 四维重建框架开发 Step3: Motion and appearance learning 动作与外观学习 Output: Enhanced representations of part-level dynamics 改进的部分动态表示 |
| 9.5 | [9.5] 2503.19914 Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models [{'name': 'Sangwon Beak, Hyeonwoo Kim, Hanbyul Joo'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D spatial relationships object-object relationships diffusion models synthetic 3D samples |
Input: Synthesized 2D images 从合成的 2D 图像获取数据 Step1: Generate 3D samples from 2D images 从 2D 图像生成 3D 样本 Step2: Train score-based OOR diffusion model 训练基于分数的 OOR 扩散模型 Step3: Extend to multi-object OOR 扩展到多对象 OOR Output: Distributions of spatial relationships 输出空间关系的分布 |
| 9.2 | [9.2] 2503.19011 RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis [{'name': 'Yifei Feng, Mingxin Yang, Shuhui Yang, Sheng Zhang, Jiaao Yu, Zibo Zhao, Yuhong Liu, Jie Jiang, Chunchao Guo'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D-aware texture generation 3D感知纹理生成 multi-view consistency 多视角一致性 texture synthesis 纹理合成 |
Input: 3D geometries and multi-view images 3D几何体和多视角图像 Step1: Integrate multi-view image information 整合多视角图像信息 Step2: Develop a multi-attention texture synthesis network 开发多注意力纹理合成网络 Step3: Apply geometry-related Classifier-Free Guidance (CFG) 应用与几何相关的无分类器引导 (CFG) Output: High-quality and consistent texture maps 输出: 高质量且一致的纹理图 |
| 9.0 | [9.0] 2503.19207 FRESA:Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [{'name': 'Rong Wang, Fabian Prada, Ziyan Wang, Zhongshi Jiang, Chengxiang Yin, Junxuan Li, Shunsuke Saito, Igor Santesteban, Javier Romero, Rohan Joshi, Hongdong Li, Jason Saragih, Yaser Sheikh'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction human avatars animation feedforward multi-frame aggregation |
Input: Casual phone photos 从手机照片获取输入 Step1: 3D canonicalization 进行三维规范化 Step2: Multi-frame feature aggregation 多帧特征聚合 Step3: Avatar shape and animation inference 推断头像形状和动画 Output: Personalized 3D avatar 生成个性化的三维头像 |
| 8.5 | [8.5] 2503.19157 HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models [{'name': 'Mingzhen Huang, Fu-Jen Chu, Bugra Tekin, Kevin J Liang, Haoyu Ma, Weiyao Wang, Xingyu Chen, Pierre Gleize, Hongfei Xue, Siwei Lyu, Kris Kitani, Matt Feiszli, Hao Tang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D hand-object interaction 3D手-物体交互 language models 语言模型 |
Input: Text prompts or partial HOI sequences 文本提示或部分HOI序列 Step1: HOI sequence tokenization HOI序列的标记化 Step2: Bidirectional transformation between HOI sequences and text HOI序列与文本间的双向变换 Step3: HOI generation or completion HOI生成或补全 Output: Generated 3D hand-object interaction sequences 生成的3D手-物体交互序列 |
| 8.5 | [8.5] 2503.19199 Open-Vocabulary Functional 3D Scene Graphs for Real-World Indoor Spaces [{'name': 'Chenyangguang Zhang, Alexandros Delitzas, Fangjinhua Wang, Ruida Zhang, Xiangyang Ji, Marc Pollefeys, Francis Engelmann'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene graphs functional relationships RGB-D images |
Input: Posed RGB-D images 从RGB-D图像输入 Step1: Predicting objects and interactive elements 预测物体和交互元素 Step2: Inferring functional relationships 推断功能关系 Output: Functional 3D scene graph 功能3D场景图 |
| 8.5 | [8.5] 2503.19276 Context-Aware Semantic Segmentation: Enhancing Pixel-Level Understanding with Large Language Models for Advanced Vision Applications [{'name': 'Ben Rahman'}] |
Semantic Segmentation 语义分割 | v2 Semantic Segmentation Large Language Models Autonomous Driving Context-Aware Systems |
Input: Images with complex scenes 复杂场景中的图像 Step1: Integrate visual features and language embeddings 整合视觉特征和语言嵌入 Step2: Implement a Cross-Attention Mechanism 实现跨注意力机制 Step3: Utilize Graph Neural Networks for object relationships 使用图神经网络处理对象间的关系 Output: Enhanced pixel-level and contextual understanding 改进的像素级和上下文理解 |
| 8.5 | [8.5] 2503.19307 Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation [{'name': 'Zhuoran Zhao, Linlin Yang, Pengzhan Sun, Pan Hui, Angela Yao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction hand pose estimation synthetic data |
Input: Synthetic and real data合成和真实数据 Step1: Synthetic data analysis分析合成数据 Step2: Gap analysis分析gap Step3: Data synthesis pipeline proposal提出数据合成流程 Output: Enhanced hand pose estimation改进的手部姿态估计 |
| 8.5 | [8.5] 2503.19308 A Comprehensive Analysis of Mamba for 3D Volumetric Medical Image Segmentation [{'name': 'Chaohan Wang, Yutong Xie, Qi Chen, Yuyin Zhou, Qi Wu'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D segmentation 三维分割 medical imaging 医学成像 State Space Models 状态空间模型 |
Input: High-resolution 3D medical images 高分辨率3D医学图像 Step1: Evaluate Mamba against Transformers 第一阶段:评估Mamba对比Transformers Step2: Implement multi-scale representation learning 实现多尺度表征学习 Step3: Benchmark against public datasets 在公开数据集上进行基准测试 Output: Comparative analysis of segmentation performance 输出:分割性能比较分析 |
| 8.5 | [8.5] 2503.19355 ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models [{'name': 'Dohwan Ko, Sihyeon Kim, Yumin Suh, Vijay Kumar B. G, Minseo Yoon, Manmohan Chandraker, Hyunwoo J. Kim'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 spatio-temporal reasoning Vision-Language Models autonomous driving |
Input: Real-world videos with 3D annotations 实际视频与3D注释 Step1: Dataset construction 数据集构建 Step2: Kinematic instruction tuning 运动指令调优 Step3: Model training and evaluation 模型训练与评估 Output: Enhanced Vision-Language Model 改进的视觉语言模型 |
| 8.5 | [8.5] 2503.19358 From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting [{'name': 'Zhiwei Huang, Hailin Yu, Yichun Shentu, Jin Yuan, Guofeng Zhang'}] |
Camera Relocalization 相机重定位 | v2 camera relocalization 3D reconstruction Gaussian splatting |
Input: Query image 查询图像 Step1: Sparse feature extraction 稀疏特征提取 Step2: Initial pose estimation using sparse matching 根据稀疏匹配初步估计位姿 Step3: Dense feature matching for pose refinement 通过密集特征匹配进行位姿精炼 Output: Accurate camera pose result 精确的相机位姿结果 |
| 8.5 | [8.5] 2503.19391 TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception [{'name': 'Zhiying Song, Lei Yang, Fuxi Wen, Jun Li'}] |
Autonomous Systems and Robotics 自动驾驶与机器人系统 | v2 cooperative perception trajectory alignment autonomous driving feature fusion |
Input: Multi-frame LiDAR sequences 多帧激光雷达序列 Step1: Learning feature trajectories 学习特征轨迹 Step2: Generating attention points 生成注意力点 Step3: Aligning features against trajectories 将特征与轨迹对齐 Output: Enhanced cooperative perception 改进的协作感知 |
| 8.5 | [8.5] 2503.19405 Multi-modal 3D Pose and Shape Estimation with Computed Tomography [{'name': 'Mingxiao Tu, Hoijoon Jung, Alireza Moghadam, Jineel Raythatha, Lachlan Allan, Jeremy Hsu, Andre Kyme, Jinman Kim'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D pose estimation 3D姿态估计 shape estimation 形状估计 computed tomography 计算机断层扫描 multi-modal fusion 多模态融合 |
Input: Computed tomography (CT) scans and depth maps 计算机断层扫描(CT)和深度图 Step1: Feature extraction 特征提取 Step2: Probabilistic correspondence alignment 概率对应对齐 Step3: Pose and shape estimation 位置和形状估计 Step4: Parameter mixing model 参数混合模型 Output: Accurate 3D human mesh model 准确的三维人类网格模型 |
| 8.5 | [8.5] 2503.19721 EventMamba: Enhancing Spatio-Temporal Locality with State Space Models for Event-Based Video Reconstruction [{'name': 'Chengjie Ge, Xueyang Fu, Peng He, Kunyu Wang, Chengzhi Cao, Zheng-Jun Zha'}] |
Video Generation 视频生成 | v2 event-based video reconstruction spatio-temporal locality Mamba computer vision neural networks |
Input: Event data 事件数据 Step1: Implement random window offset strategy 实施随机窗口偏移策略 Step2: Apply Hilbert space filling curve mechanism 应用希尔伯特空间填充曲线机制 Step3: Model evaluation and performance benchmarking 模型评估与性能基准测试 Output: Enhanced reconstructed video frames 改进的视频重建帧 |
| 8.5 | [8.5] 2503.19755 ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation [{'name': 'Haoyu Fu, Diankun Zhang, Zongchuang Zhao, Jianfeng Cui, Dingkang Liang, Chong Zhang, Dingyuan Zhang, Hongwei Xie, Bing Wang, Xiang Bai'}] |
Autonomous Driving 自动驾驶 | v2 autonomous driving vision-language models trajectory prediction |
Input: Vision-language instructed action generation 视觉语言指令的动作生成 Step1: Combine QT-Former for temporal context aggregation 结合QT-Former进行时间上下文聚合 Step2: Utilize LLM for driving scenario reasoning 利用大型语言模型进行驾驶场景推理 Step3: Implement a generative planner for trajectory prediction 实施生成式规划器进行轨迹预测 Output: Enhanced closed-loop driving performance 改进的闭环驾驶性能 |
| 8.5 | [8.5] 2503.19764 OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations [{'name': 'Christina Kassab, Sacha Morin, Martin B\"uchner, Mat\'ias Mattamala, Kumaraditya Gupta, Abhinav Valada, Liam Paull, Maurice Fallon'}] |
3D Scene Representation 三维场景表示 | v2 3D scene representation open-vocabulary benchmark |
Input: 3D scene representations 三维场景表示 Step1: Open-set category labeling 开放集类别标注 Step2: Benchmark dataset creation 基准数据集创建 Step3: Evaluation on semantic segmentation 语义分割评估 Step4: Evaluation on object retrieval 对象检索评估 Output: OpenLex3D benchmark dataset 开放Lex3D基准数据集 |
| 8.0 | [8.0] 2503.19654 RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models [{'name': 'Mehdi Moshtaghi, Siavash H. Khajavi, Joni Pajarinen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models 视觉语言模型 RGB-Thermal understanding RGB-热成像理解 |
Input: RGB-Thermal image pairs RGB-热成像对 Step1: Comprehensive evaluation framework 建立全面评估框架 Step2: Annotation of Yes/No questions 对是/否问题的标注 Step3: Performance evaluation on VLMs 对视觉语言模型的性能评估 Output: Benchmark for assessing VLMs 性能评估基准 |
| 8.0 | [8.0] 2503.19794 PAVE: Patching and Adapting Video Large Language Models [{'name': 'Zhuoming Liu, Yiquan Li, Khoi Duc Nguyen, Yiwu Zhong, Yin Li'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 Video LLMs 3D reasoning multimodal learning |
Input: Pre-trained Video LLMs 和附加信号 Step1: 插入轻量级适配器以适应下游任务 Step2: 融合视频与其他信号 Step3: 评估模型在不同任务上的表现 Output: 改进的模型表现 |
| 7.5 | [7.5] 2503.19325 Long-Context Autoregressive Video Modeling with Next-Frame Prediction [{'name': 'Yuchao Gu, Weijia Mao, Mike Zheng Shou'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation autoregressive modeling temporal context |
Input: Video data 视频数据 Step 1: Introduce FAR for video autoregressive modeling 引入FAR用于视频自回归建模 Step 2: Implement FlexRoPE for temporal decay 实现FlexRoPE以进行时间衰减 Step 3: Apply long short-term context modeling 应用长短期上下文建模 Output: State-of-the-art video generation 先进的视频生成 |
| 7.5 | [7.5] 2503.19462 AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset [{'name': 'Haiyu Zhang, Xinyuan Chen, Yaohui Wang, Xihui Liu, Yunhong Wang, Yu Qiao'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation diffusion models synthetic dataset |
Input: Pretrained video diffusion model 预训练视频扩散模型 Step 1: Generate synthetic dataset from denoising trajectories 从去噪轨迹生成合成数据集 Step 2: Design trajectory-based few-step guidance 设计基于轨迹的少步指导 Step 3: Implement adversarial training to align output distribution 实施对抗训练以对齐输出分布 Output: Accelerated video generation 加速视频生成 |
| 7.5 | [7.5] 2503.19839 FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model [{'name': 'Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 image editing vision language models fine-grained editing |
Input: User editing instructions 用户编辑指令 Step1: Integrate region tokens 集成区域标记 Step2: Use VLM for comprehension 使用视觉语言模型进行理解 Step3: Apply diffusion model for editing 应用扩散模型进行编辑 Output: Edited images 生成的编辑图像 |
| 7.5 | [7.5] 2503.19910 CoLLM: A Large Language Model for Composed Image Retrieval [{'name': 'Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Composed Image Retrieval Vision-Language Models Large Language Models |
Input: Image-caption pairs 图像-字幕对 Step1: Dynamic triplet synthesis 动态三元组合成 Step2: Model training 模型训练 Output: Enhanced composed image retrieval systems 改进的组合图像检索系统 |
Arxiv 2025-03-25
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.17467 High Efficiency Wiener Filter-based Point Cloud Quality Enhancement for MPEG G-PCC [{'name': 'Yuxuan Wei, Zehan Wang, Tian Guo, Hao Liu, Liquan Shen, Hui Yuan'}] |
3D Reconstruction and Modeling 三维重建 | v2 point cloud compression Wiener filter 3D reconstruction 三维重建 |
Input: Point clouds 点云 Step1: Introduce basic Wiener filter 基本维纳滤波器引入 Step2: Improve filter with coefficients inheritance and variance-based classification 改善滤波器,引入系数继承和基于方差的分类 Step3: Fast nearest neighbor search using Morton code 快速最近邻搜索,使用Morton编码 Output: Enhanced point cloud quality 改进的点云质量 |
| 9.5 | [9.5] 2503.17486 ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes [{'name': 'Zhengqing Gao, Dongting Hu, Jia-Wang Bian, Huan Fu, Yan Li, Tongliang Liu, Mingming Gong, Kun Zhang'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting novel view synthesis efficient rendering |
Input: Gaussian primitives 高斯原语 Step1: Grouping Gaussians into prototypes 组合同类高斯点为原型 Step2: Clustering using K-means 使用K均值聚类 Step3: Joint optimization of anchor points and prototypes 对锚点和原型进行联合优化 Output: Efficient and high-quality rendering 高效且高质量的渲染 |
| 9.5 | [9.5] 2503.17668 3D Modeling: Camera Movement Estimation and path Correction for SFM Model using the Combination of Modified A-SIFT and Stereo System [{'name': 'Usha Kumari, Shuvendu Rana'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Structure From Motion camera movement Affine SIFT |
Input: Multi-view images 多视角图像 Step1: Extract matching points 提取匹配点 Step2: Camera rotation estimation 相机旋转估计 Step3: Translation estimation and correction 平移估计与修正 Output: Accurate 3D model creation 准确的三维模型生成 |
| 9.5 | [9.5] 2503.17798 GaussianFocus: Constrained Attention Focus for 3D Gaussian Splatting [{'name': 'Zexu Huang, Min Xu, Stuart Perry'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting neural rendering photo-realistic rendering |
Input: 3D Gaussian representations 三维高斯表示 Step1: Patch attention algorithm application 局部关注算法应用 Step2: Gaussian constraints implementation 高斯约束实施 Step3: Subdivision strategy for large scenes 大场景分割策略 Output: Enhanced rendering quality 改进的渲染质量 |
| 9.5 | [9.5] 2503.17814 LightLoc: Learning Outdoor LiDAR Localization at Light Speed [{'name': 'Wen Li, Chen Liu, Shangshu Yu, Dunqiang Liu, Yin Zhou, Siqi Shen, Chenglu Wen, Cheng Wang'}] |
Autonomous Driving 自动驾驶 | v2 LiDAR localization SLAM autonomous driving |
Input: LiDAR data 和 LiDAR 数据 Step1: Sample classification guidance 样本分类指导 Step2: Redundant sample downsampling 冗余样本下采样 Step3: Integration into SLAM and model evaluation 集成到SLAM并进行模型评估 Output: Fast-trainable localization model 快速可训练的定位模型 |
| 9.5 | [9.5] 2503.17856 ClaraVid: A Holistic Scene Reconstruction Benchmark From Aerial Perspective With Delentropy-Based Complexity Profiling [{'name': 'Radu Beche, Sergiu Nedevschi'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction aerial imagery dataset creation scene complexity |
Input: Aerial imagery from UAV 多视角无人机图像 Step1: Dataset creation 数据集创建 Step2: Scene complexity profiling 场景复杂度分析 Step3: Benchmarking reconstruction methods 重建方法基准测试 Output: High-quality synthetic dataset 高质量合成数据集 |
| 9.5 | [9.5] 2503.17973 PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos [{'name': 'Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang, Hsin-Ni Yu, Shenlong Wang, Yunzhu Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction physics-informed models robotic motion planning deformable objects real-time simulation |
Input: Sparse videos of deformable objects 变形物体的稀疏视频 Step1: Develop physics-informed representation 发展物理信息表示 Step2: Integrate inverse modeling framework 整合反向建模框架 Step3: Optimize geometry and physical properties 优化几何和物理属性 Output: Interactive digital twin interactive digital twin |
| 9.5 | [9.5] 2503.18007 SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance [{'name': 'Hongyu Yan, Zijun Li, Kunming Luo, Li Lu, Ping Tan'}] |
Point Cloud Processing 点云处理 | v2 Point cloud completion 3D reconstruction symmetry guidance |
Input: Partial point clouds Step1: Local Symmetry Transformation Network (LSTNet) estimates point-wise local symmetry transformations. Step2: Generate geometry-aligned partial-missing pairs and initial point clouds. Step3: Symmetry-Guidance Transformer (SGFormer) refines the initial point clouds using geometric features. Output: High-fidelity and geometry-consistency final point clouds. |
| 9.5 | [9.5] 2503.18100 M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving [{'name': 'Xuesong Chen, Shaoshuai Shi, Tao Ma, Jingqiu Zhou, Simon See, Ka Chun Cheung, Hongsheng Li'}] |
3D Detection 三维检测 | v2 3D detection 三维检测 autonomous driving 自动驾驶 multi-task learning 多任务学习 |
Input: Multimodal data from sensors and cameras 多模态传感器和相机数据 Step1: Feature extraction from images and LiDAR features 从图像和LiDAR特征提取 Step2: Modality-Adaptive Feature Integration (MAFI) module implementation 实现模态自适应特征集成(MAFI)模块 Step3: Task-specific query initialization for detection and segmentation 目标检测和分割的任务特定查询初始化 Step4: Shared BEV features transformation through multi-layer decoders 共享BEV特征的多层解码器变换 Output: Enhanced detection, segmentation, and occupancy prediction results 改进的检测、分割和占用预测结果 |
| 9.5 | [9.5] 2503.18135 MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation [{'name': 'Jiaxin Huang, Runnan Chen, Ziwen Li, Zhengqing Gao, Xiao He, Yandong Guo, Mingming Gong, Tongliang Liu'}] |
3D Reasoning Segmentation 3D推理分割 | v2 3D reasoning segmentation multimodal learning user intent |
Input: Multi-view images and text queries 多视角图像和文本查询 Step1: Generate multi-view pseudo segmentation masks 生成多视角伪分割掩模 Step2: Unproject 2D masks into 3D space 将2D掩模投影到3D空间 Step3: Align masks with text embeddings 将掩模与文本嵌入对齐 Step4: Implement spatial consistency strategy 实施空间一致性策略 Output: Coherent 3D segmentation masks 输出一致的3D分割掩模 |
| 9.5 | [9.5] 2503.18361 NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction [{'name': 'Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou, Baorui Ma, Kanle Shi, Yu-Shen Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction neural radiance fields multi-view consistency surface reconstruction signed distance function |
Input: Multi-view RGB images 多视角RGB图像 Step1: Learn neural radiance fields using volume rendering 学习使用体积渲染的神经辐射场 Step2: Impose multi-view consistency constraint 强加多视角一致性约束 Step3: Infer signed distance fields (SDF) 推断有符号距离场 Step4: Evaluate surface reconstruction against benchmarks 评估表面重建结果 |
| 9.5 | [9.5] 2503.18363 MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction [{'name': 'Wenyuan Zhang, Yixiao Yang, Han Huang, Liang Han, Kanle Shi, Yu-Shen Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction neural rendering monocular depth multi-view uncertainty |
Input: Multi-view images 多视角图像 Step1: Segment multi-view images into consistent instances 将多视角图像分割为一致的实例 Step2: Back-project and align estimated depth values 将估计的深度值反投影并对齐 Step3: Evaluate point density to measure uncertainty 评估点密度以测量不确定性 Output: Uncertainty maps and enhanced geometric priors 不确定性图和增强几何先验 |
| 9.5 | [9.5] 2503.18368 MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning [{'name': 'Xu Han, Yuan Tang, Jinfeng Xu, Xianzhi Li'}] |
3D Representation Learning 3D表示学习 | v2 3D representation learning parameter-efficient fine-tuning point clouds |
Input: 3D point clouds 3D点云 Step1: Parameter-efficient fine-tuning using structured matrices 使用结构化矩阵进行参数高效微调 Step2: Model training and evaluation 模型训练与评估 Output: Enhanced representation for 3D tasks 改进的3D任务表示 |
| 9.5 | [9.5] 2503.18402 DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds [{'name': 'Youyu Chen, Junjun Jiang, Kui Jiang, Xiao Tang, Zhihao Li, Xianming Liu, Yinyu Nie'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting 3D高斯点云 optimization optimization rendering rendering |
Input: 3D scenes 3D场景 Step1: Optimization complexity analysis 优化复杂度分析 Step2: Scheduling rendering resolution 渲染分辨率调度 Step3: Adaptive primitive growth primitives 自适应原始增长 Output: Accelerated 3D Gaussian Splatting model 加速的3D高斯点云模型 |
| 9.5 | [9.5] 2503.18438 ReconDreamer++: Harmonizing Generative and Reconstructive Models for Driving Scene Representation [{'name': 'Guosheng Zhao, Xiaofeng Wang, Chaojun Ni, Zheng Zhu, Wenkang Qin, Guan Huang, Xingang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction autonomous driving |
Input: Multi-view images 多视角图像 Step1: Domain gap mitigation 域间隙缓解 Step2: Spatial deformation learning 空间变形学习 Step3: 3D Gaussian modeling 三维高斯建模 Output: Improved driving scene representation 改进的驾驶场景表示 |
| 9.5 | [9.5] 2503.18458 StableGS: A Floater-Free Framework for 3D Gaussian Splatting [{'name': 'Luchao Wang, Qian Ren, Kaiming He, Hua Wang, Zhi Chen, Yaohua Tang'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting novel view synthesis floater artifacts |
Input: 3D Gaussian Splatting data 3D高斯点云数据 Step1: Analyze gradient vanishing gradient消失分析 Step2: Develop cross-view depth consistency constraints 开发视图间深度一致性约束 Step3: Integrate a dual-opacity model 集成双透明度模型 Output: Enhanced novel view synthesis results 改进的新的视图合成结果 |
| 9.5 | [9.5] 2503.18461 MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing [{'name': 'Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu'}] |
3D Generation 三维生成 | v2 3D PBR Texturing Multi-Channel Generation Agentic Post-Processing |
Input: Untextured mesh and user inputs 未纹理化网格与用户输入 Step1: Multi-channel multi-view generation 多通道多视角生成 Step2: Agentic post-processing 代理后处理 Output: High-fidelity PBR textures 高保真物理基础渲染纹理 |
| 9.5 | [9.5] 2503.18476 Global-Local Tree Search for Language Guided 3D Scene Generation [{'name': 'Wei Deng, Mengshi Qi, Huadong Ma'}] |
3D Scene Generation 3D场景生成 | v2 3D indoor scene generation Vision-Language Models (VLMs) tree search algorithm |
Input: User-provided scene descriptions 用户提供的场景描述 Step1: Hierarchical scene representation construction 层次场景表示构建 Step2: Global-local tree search algorithm application 全局-局部树搜索算法应用 Step3: Object placement using VLM object recognition 使用VLM对象识别进行物体放置 Output: Realistic 3D indoor scenes 真实的室内3D场景 |
| 9.5 | [9.5] 2503.18527 AIM2PC: Aerial Image to 3D Building Point Cloud Reconstruction [{'name': 'Soulaimene Turki, Daniel Panangian, Houda Chaabouni-Chouayakh, Ksenia Bittner'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction point cloud building reconstruction aerial image |
Input: Single aerial image 单幅航空图像 Step1: Feature extraction 特征提取 Step2: Concatenate additional conditions 连接附加条件 Step3: Point cloud diffusion modeling 点云扩散建模 Output: Complete 3D building point cloud 生成完整的三维建筑点云 |
| 9.5 | [9.5] 2503.18557 LeanStereo: A Leaner Backbone based Stereo Network [{'name': 'Rafia Rahim, Samuel Woerz, Andreas Zell'}] |
Stereo Matching 立体匹配 | v2 Stereo Matching Depth Estimation 3D Reconstruction |
Input: Rectified stereo images 经过校正的立体图像 Step1: Feature extraction 特征提取 Step2: Cost volume integration 成本体积集成 Step3: Disparity regression 视差回归 Output: Depth map 深度图 |
| 9.5 | [9.5] 2503.18640 LLGS: Unsupervised Gaussian Splatting for Image Enhancement and Reconstruction in Pure Dark Environment [{'name': 'Haoran Wang, Jingwei Huang, Lu Yang, Tianchen Deng, Gaojing Zhang, Mingrui Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting multi-view optimization low-light enhancement |
Input: Low-light images 低光照图像 Step1: Gaussian representation decomposition 高斯表示分解 Step2: Image enhancement enhancement 图像增强 Step3: Multi-view consistency optimization 多视角一致性优化 Output: Enhanced 3D models 改进的三维模型 |
| 9.5 | [9.5] 2503.18671 Structure-Aware Correspondence Learning for Relative Pose Estimation [{'name': 'Yihan Chen, Wenfei Yang, Huan Ren, Shifeng Zhang, Tianzhu Zhang, Feng Wu'}] |
3D Reconstruction and Modeling 三维重建 | v2 Relative Pose Estimation 相对姿态估计 3D Correspondences 3D对应 Keypoint Extraction 关键点提取 Structure-Aware 学习结构感知 |
Input: Query and reference images 图像输入 Step1: Structure-aware keypoint extraction module 结构感知关键点提取模块 Step2: Structure-aware correspondence estimation module 结构感知对应估计模块 Step3: 3D-3D correspondence establishment 3D-3D对应建立 Output: Estimated relative pose 估计的相对姿态 |
| 9.5 | [9.5] 2503.18682 Hardware-Rasterized Ray-Based Gaussian Splatting [{'name': 'Samuel Rota Bul\`o, Nemanja Bartolovic, Lorenzo Porzi, Peter Kontschieder'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting ray-based rendering virtual reality |
Input: 3D Gaussian primitives 3D 高斯原语 Step1: Mathematical derivation 数学推导 Step2: Efficient rendering techniques 高效渲染技术 Step3: Performance evaluation 性能评估 Output: High-quality rendering output 高质量渲染输出 |
| 9.5 | [9.5] 2503.18794 NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting [{'name': 'Yulong Zheng, Zicheng Jiang, Shengfeng He, Yandu Sun, Junyu Dong, Huaidong Zhang, Yong Du'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction sparse view synthesis Neural Radiance Fields 3D Gaussian Splatting |
Input: Sparse-view images 稀疏视图图像 Step 1: Depth computation using optical flow and camera poses 使用光流和相机姿态进行深度计算 Step 2: Point cloud densification 点云密化 Step 3: Model evaluation and comparison 模型评估与比较 Output: Enhanced novel view synthesis 输出:改进的视图合成 |
| 9.5 | [9.5] 2503.18897 Online 3D Scene Reconstruction Using Neural Object Priors [{'name': 'Thomas Chabal, Shizhe Chen, Jean Ponce, Cordelia Schmid'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction neural implicit representations |
Input: RGB-D video sequence RGB-D视频序列 Step1: Extract object masks and camera poses 提取物体掩模和相机姿态 Step2: Continuous optimization of object representation 对物体表示进行连续优化 Step3: Utilize shape priors from object library 利用物体库中的形状先验 Output: Online reconstructed 3D scene 在线重建的3D场景 |
| 9.5 | [9.5] 2503.18945 Aether: Geometric-Aware Unified World Modeling [{'name': 'Aether Team, Haoyi Zhu, Yifan Wang, Jianjun Zhou, Wenzheng Chang, Yang Zhou, Zizun Li, Junyi Chen, Chunhua Shen, Jiangmiao Pang, Tong He'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 4D reconstruction autonomous systems |
Input: Synthetic 4D video data 合成的4D视频数据 Step1: Data annotation data annotation 数据标注 Step2: Multi-task optimization 多任务优化 Step3: Model training and evaluation 模型训练与评估 Output: Unified world model with geometric reasoning 具有几何推理的统一世界模型 |
| 9.2 | [9.2] 2503.17712 Multi-modality Anomaly Segmentation on the Road [{'name': 'Heng Gao, Zhuolin He, Shoumeng Qiu, Xiangyang Xue, Jian Pu'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 anomaly segmentation autonomous driving multi-modal |
Input: Road images with anomalies 具有异常的路面图像 Step1: Text-modal extraction using CLIP 通过CLIP提取文本模态 Step2: Anomaly score computation 计算异常得分 Step3: Ensemble boosting of scores 加权平均多个得分 Output: Anomaly segmentation map 异常 сегментация图 |
| 9.2 | [9.2] 2503.18052 SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining [{'name': 'Yue Li, Qi Ma, Runyi Yang, Huapeng Li, Mengjiao Ma, Bin Ren, Nikola Popovic, Nicu Sebe, Ender Konukoglu, Theo Gevers, Luc Van Gool, Martin R. Oswald, Danda Pani Paudel'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting 3D scene understanding self-supervised learning |
Input: 3D Gaussian Splatting (3DGS) data 3D高斯点云数据 Step 1: Dataset creation (SceneSplat-7K) 数据集创建(SceneSplat-7K) Step 2: Model training with vision-language pretraining 通过视觉语言预训练训练模型 Step 3: Evaluate performance on segmentation benchmarks 在分割基准上评估性能 Output: Enhanced understanding of 3D scenes 改进的3D场景理解 |
| 9.2 | [9.2] 2503.18107 PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding [{'name': 'Hongjia Zhai, Hai Li, Zhenzhe Li, Xiaokun Pan, Yijia He, Guofeng Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D panoptic segmentation 3D Gaussian Splatting scene understanding language-guided segmentation |
Input: Multi-view posed images 多视角图像 Step1: Model continuous parametric feature space 建模连续参数特征空间 Step2: Use 3D feature decoder 采用三维特征解码器 Step3: Perform graph clustering based segmentation 进行基于图聚类的分割 Output: 3D consistent instance segmentation 三维一致实例分割 |
| 9.2 | [9.2] 2503.18155 Decorum: A Language-Based Approach For Style-Conditioned Synthesis of Indoor 3D Scenes [{'name': 'Kelly O. Marshall, Omid Poursaeed, Sergiu Oprea, Amit Kumar, Anushrut Jignasu, Chinmay Hegde, Yilei Li, Rakesh Ranjan'}] |
3D Generation 三维生成 | v2 3D scene generation natural language processing multimodal learning |
Input: User-generated prompts 用户生成的提示 Step 1: Text to dense annotation text 转为密集注释 Step 2: Layout design for objects 设计对象布局 Step 3: Furniture selection from inventory 从库存中选择家具 Output: Structured 3D indoor scenes 输出结构化三维室内场景 |
| 9.2 | [9.2] 2503.18254 Surface-Aware Distilled 3D Semantic Features [{'name': 'Lukas Uzolas, Elmar Eisemann, Petr Kellnhofer'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction semantic features |
Input: Training with unpaired 3D meshes 使用无配对3D网格进行训练 Step1: Learning a surface-aware embedding space 学习表面感知嵌入空间 Step2: Implementing a contrastive loss to improve feature distinction 实施对比损失以提高特征区分 Output: Robust 3D features for various applications 输出: 适用于多种应用的稳健3D特征 |
| 9.0 | [9.0] 2503.17574 Is there anything left? Measuring semantic residuals of objects removed from 3D Gaussian Splatting [{'name': 'Simona Kocour, Assia Benbihi, Aikaterini Adam, Torsten Sattler'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction semantic residuals object removal privacy-preserving mapping 3D Gaussian Splatting |
Input: 3D scenes with objects 3D场景与对象 Step1: Evaluation of object removal methods 对对象删除方法进行评估 Step2: Measurement of semantic residuals 语义残余的测量 Step3: Refinement of object removal results 根据空间和语义一致性优化删除结果 Output: Evaluated removal quality and refined scenes 评估删除质量和优化后场景 |
| 9.0 | [9.0] 2503.18083 Unified Geometry and Color Compression Framework for Point Clouds via Generative Diffusion Priors [{'name': 'Tianxin Huang, Gim Hee Lee'}] |
3D Reconstruction and Modeling 三维重建 | v2 point cloud compression generative diffusion models 3D modeling autonomous driving |
Input: 3D point clouds with color attributes 具有颜色属性的3D点云 Step1: Adaptation of pre-trained generative diffusion model 适应预训练生成扩散模型 Step2: Compression using prompt tuning 使用提示调优进行压缩 Step3: Data encoding into sparse sets 将数据编码为稀疏集合 Step4: Decompression through denoising steps 通过去噪步骤进行解压缩 Output: Compressed and decompressed point clouds 压缩和解压缩的点云 |
| 9.0 | [9.0] 2503.18944 DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation [{'name': 'Karim Abou Zeid, Kadir Yilmaz, Daan de Geus, Alexander Hermans, David Adrian, Timm Linder, Bastian Leibe'}] |
3D Segmentation 3D分割 | v2 3D segmentation 3D分割 2D foundation models 2D基础模型 semantic segmentation 语义分割 |
Input: 2D foundation model features 2D基础模型特征 Step1: Feature extraction 特征提取 Step2: 2D to 3D projection 2D到3D投影 Step3: Integration into 3D segmentation model 集成到3D分割模型中 Output: Enhanced 3D segmentation performance 改进的3D分割性能 |
| 8.5 | [8.5] 2503.17406 IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes [{'name': 'Haochen Zhang, Nader Zantout, Pujith Kachana, Ji Zhang, Wenshan Wang'}] |
3D Scene Understanding 3D 场景理解 | v2 3D scenes referential grounding benchmark dataset multimodal integration interactive navigation |
Input: 3D scanned rooms 3D 扫描房间 Step1: Dataset curation 数据集策划 Step2: Model evaluation 模型评估 Step3: Baseline development 基线开发 Output: Resource for interactive navigation systems 为交互导航系统提供资源 |
| 8.5 | [8.5] 2503.17415 Enhancing Subsequent Video Retrieval via Vision-Language Models (VLMs) [{'name': 'Yicheng Duan, Xi Huang, Duo Chen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models (VLMs) Video Retrieval 视频检索 Contextual Relationships 上下文关系 |
Input: Video segments 视频段 Step1: Embed video frames using VLM 使用视觉语言模型(VLM)嵌入视频框架 Step2: Combine embeddings with contextual metadata 结合嵌入与上下文元数据 Step3: Implement vector similarity search with graph structures 实现与图结构的向量相似性搜索 Output: Refined video retrieval results 精细化的视频检索结果 |
| 8.5 | [8.5] 2503.17499 Event-Based Crossing Dataset (EBCD) [{'name': "Joey Mul\'e, Dhandeep Challagundla, Rachit Saini, Riadul Islam"}] |
Event-based Vision 事件视觉 | v2 Event-based vision 事件视觉 object detection 目标检测 autonomous systems 自主系统 |
Input: Event-based images 事件图像 Step1: Data capture using multi-thresholding 多阈值数据捕获 Step2: Object detection using CNNs 使用卷积神经网络进行目标检测 Step3: Performance evaluation against traditional datasets 性能评估与传统数据集对比 Output: Enhanced dataset for event-based detection 改进的事件检测数据集 |
| 8.5 | [8.5] 2503.17539 Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks [{'name': 'Bhishma Dedhia, David Bourgin, Krishna Kumar Singh, Yuheng Li, Yan Kang, Zhan Xu, Niraj K. Jha, Yuchen Liu'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation parallel inference Diffusion Transformers temporal consistency |
Input: Short videos 短视频 Step1: Encoding video chunks 编码视频块 Step2: Parallel inference of video segments 视频段落的并行推理 Step3: Denoising video chunks 去噪视频块 Output: Long photorealistic videos 长的照相真实视频 |
| 8.5 | [8.5] 2503.17695 MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion [{'name': 'Yikun Ma, Yiqing Li, Jiawei Wu, Zhi Jin'}] |
Multi-view and Stereo Vision 多视角立体视觉 | v2 motion editing multi-view consistency generative models optical flow |
Input: Static scene and user-selected motion priors 静态场景和用户选择的运动先验 Step1: Multi-view Flow Estimation Stage (MFES) 多视角流估计阶段 Step2: Point Kinematic Model (PKM) to estimate optical flows 使用点运动模型估计光流 Step3: Multi-view Motion Diffusion Stage (MMDS) to generate motion results 多视角运动扩散阶段生成运动结果 Output: Consistent multi-view motion results 一致的多视角运动结果 |
| 8.5 | [8.5] 2503.17752 HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving [{'name': 'R. D. Lin, Pengcheng Weng, Yinqiao Wang, Han Ding, Jinsong Han, Fei Wang'}] |
LiDAR Segmentation 激光雷达分割 | v2 LiDAR segmentation autonomous driving semi-supervised learning |
Input: Continuous LiDAR frames 连续的激光雷达帧 Step1: Learn high and low temporal sensitivity representations 学习高低时间敏感性表示 Step2: Enhance representations使用交叉注意力机制增强表示 Step3: Teacher-student framework alignment 在标签和未标签分支上对齐表示 Output: Segmentation results based on LiDAR frames 基于激光雷达帧的分割结果 |
| 8.5 | [8.5] 2503.17788 Aligning Foundation Model Priors and Diffusion-Based Hand Interactions for Occlusion-Resistant Two-Hand Reconstruction [{'name': 'Gaoge Han, Yongkang Cheng, Zhe Chen, Shaoli Huang, Tongliang Liu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D hand reconstruction occlusion handling multimodal prior integration diffusion models fusion alignment |
Input: Monocular images 单目图像 Step1: Learn to align fused multimodal priors (keypoints, segmentation maps, depth cues) from foundation models during training 训练期间学习对齐融合的多模态先验(关键点、分割图、深度线索) Step2: Employ a two-hand diffusion model to correct interpenetration artifacts 应用双手扩散模型以修正穿透伪影 Output: Occlusion-resistant two-hand reconstruction 具抗遮挡能力的双手重建 |
| 8.5 | [8.5] 2503.17938 Selecting and Pruning: A Differentiable Causal Sequentialized State-Space Model for Two-View Correspondence Learning [{'name': 'Xiang Fang, Shihua Zhang, Hao Zhang, Tao Lu, Huabing Zhou, Jiayi Ma'}] |
Multi-view and Stereo Vision 多视角和立体视觉 | v2 correspondence learning two-view matching SFM pose estimation |
Input: Two-view image pairs 两幅图像对 Step1: Develop correspondence filter 研发对应过滤器 Step2: Implement causal sequence learning 实现因果序列学习 Step3: Integrate local-context enhancement module 集成局部上下文增强模块 Step4: Evaluate performance on relative pose estimation 评估相对姿态估计的性能 Output: Enhanced matching accuracy 提升匹配精度 |
| 8.5 | [8.5] 2503.17982 Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images [{'name': 'Yara AlaaEldin, Francesca Odone'}] |
Depth Estimation 深度估计 | v2 depth estimation semantic segmentation aerial images autonomous navigation |
Input: Aerial images from monocular cameras 单目相机的航拍图像 Step1: Joint architecture design 结构设计 Step2: Depth estimation map prediction 深度估计图的预测 Step3: Semantic segmentation map prediction 语义分割图的预测 Output: Depth and semantic segmentation maps 深度和语义分割图 |
| 8.5 | [8.5] 2503.17992 Geometric Constrained Non-Line-of-Sight Imaging [{'name': 'Xueying Liu, Lianfang Wang, Jun Liu, Yong Wang, Yuping Duan'}] |
3D Reconstruction and Modeling 三维重建 | v2 Non-line-of-sight imaging 3D reconstruction surface normal geometric constraint |
Input: Non-line-of-sight (NLOS) data 近眼不可见数据 Step1: Joint estimation of normals and albedo 法线与反照率的联合估计 Step2: Apply Frobenius norm regularization 应用弗罗贝纽斯范数正则化 Step3: High-precision surface reconstruction 提高准确性的表面重建 Output: Accurate geometry of hidden objects 隐藏物体的准确几何形状 |
| 8.5 | [8.5] 2503.18016 Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [{'name': 'Xu Zheng, Ziqiao Weng, Yuanhuiyi Lyu, Lutao Jiang, Haiwei Xue, Bin Ren, Danda Paudel, Nicu Sebe, Luc Van Gool, Xuming Hu'}] |
Image and Video Generation 图像生成与视频生成 | v2 Retrieval-Augmented Generation computer vision 3D generation |
Input: A comprehensive overview of retrieval-augmented generation techniques in computer vision 计算机视觉中的检索增强生成技术概述 Step1: Review of visual understanding tasks 视觉理解任务评估 Step2: Examination of visual generation applications 视觉生成应用调查 Step3: Proposal of future research directions 提出未来研究方向 Output: Insights into RAG applications in 3D generation and embodied AI 3D生成和实体AI中的RAG应用见解 |
| 8.5 | [8.5] 2503.18073 PanopticSplatting: End-to-End Panoptic Gaussian Splatting [{'name': 'Yuxuan Xie, Xuan Yu, Changjian Jiang, Sitong Mao, Shunbo Zhou, Rui Fan, Rong Xiong, Yue Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian splatting |
Input: Multi-view images 多视角图像 Step1: Gaussian segmentation 高斯分割 Step2: Label blending 标签混合 Step3: Cross attention mechanism 交叉注意机制 Output: Consistent 3D panoptic segments 一致的三维全景分段 |
| 8.5 | [8.5] 2503.18177 Training A Neural Network For Partially Occluded Road Sign Identification In The Context Of Autonomous Vehicles [{'name': 'Gulnaz Gimaletdinova, Dim Shaiakhmetov, Madina Akpaeva, Mukhammadmuso Abduzhabbarov, Kadyrmamat Momunov'}] |
Robotic Perception 机器人感知 | v2 traffic sign recognition partially occlusion autonomous vehicles CNN |
Input: Dataset of road sign images with occlusions 道路标志图像数据集(包含遮挡) Step1: Data collection 数据收集 Step2: Model training using CNN models 模型训练使用卷积神经网络 Step3: Comparison of models against transfer learning models 模型与迁移学习模型比较 Output: Performance metrics of models 模型性能指标 |
| 8.5 | [8.5] 2503.18283 Voxel-based Point Cloud Geometry Compression with Space-to-Channel Context [{'name': 'Bojun Liu, Yangzhi Ma, Ao Luo, Li Li, Dong Liu'}] |
Point Cloud Processing 点云处理 | v2 Point cloud compression 点云压缩 Sparse convolution 稀疏卷积 |
Input: Point cloud data 点云数据 Step1: Context model development 上下文模型开发 Step2: Geometry residual coding development 几何残差编码开发 Step3: Performance evaluation 性能评估 Output: Compressed point cloud representation 压缩点云表示 |
| 8.5 | [8.5] 2503.18328 TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering [{'name': 'Chun Gu, Xiaofei Wei, Li Zhang, Xiatian Zhu'}] |
Inverse Rendering 逆向渲染 | v2 inverse rendering importance sampling 3D reconstruction multi-view images |
Input: Multi-view images 多视角图像 Step1: Importance sampling 重要性采样 Step2: Sampler learning sampler 学习采样器 Step3: Scene representation scene representation 现场表示 Output: Enhanced rendering outputs 改进的渲染输出 |
| 8.5 | [8.5] 2503.18341 PS-EIP: Robust Photometric Stereo Based on Event Interval Profile [{'name': 'Kazuma Kitazawa, Takahito Aoto, Satoshi Ikehata, Tsuyoshi Takatani'}] |
3D Reconstruction and Modeling 三维重建 | v2 Photometric Stereo 光度立体 Event Camera 事件摄像机 3D Reconstruction 三维重建 |
Input: Event data from an event camera 事件摄像头的数据 Step1: Formulate event interval profiles 形成事件间隔剖面 Step2: Introduce outlier detection based on profile shape 引入基于剖面形状的异常值检测 Step3: Estimate surface normals using the derived profiles 使用推导的剖面估计表面法线 Output: Robustly estimated surface normals 可靠的表面法线估计 |
| 8.5 | [8.5] 2503.18384 LiDAR Remote Sensing Meets Weak Supervision: Concepts, Methods, and Perspectives [{'name': 'Yuan Gao, Shaobo Xia, Pu Wang, Xiaohuan Xi, Sheng Nie, Cheng Wang'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 LiDAR remote sensing weakly supervised learning 3D reconstruction point clouds |
Input: LiDAR data and annotations LiDAR数据和注释 Step1: Review of LiDAR interpretation和反演的研究现状 Step2: Summary of weakly supervised techniques 柔性监督技术的总结 Step3: Discussion of future research directions 未来研究方向的讨论 Output: Comprehensive review of LiDAR remote sensing 综述LiDAR遥感 |
| 8.5 | [8.5] 2503.18393 PDDM: Pseudo Depth Diffusion Model for RGB-PD Semantic Segmentation Based in Complex Indoor Scenes [{'name': 'Xinhua Xu, Hong Liu, Jianbing Wu, Jinfu Liu'}] |
Image and Video Generation 图像生成 | v2 RGB segmentation pseudo depth semantic segmentation |
Input: RGB images and pseudo depth maps RGB图像和伪深度图 Step1: Generate pseudo depth maps 生成伪深度图 Step2: Integrate RGB and pseudo depth 结合RGB和伪深度 Step3: Apply Pseudo Depth Aggregation Module (PDAM) 应用伪深度聚合模块 (PDAM) Step4: Utilize diffusion model for feature extraction 利用扩散模型进行特征提取 Output: Segmentation results segmentation结果 |
| 8.5 | [8.5] 2503.18408 Fast and Physically-based Neural Explicit Surface for Relightable Human Avatars [{'name': 'Jiacheng Wu, Ruiqi Zhang, Jie Chen, Hui Zhang'}] |
Neural Rendering 神经渲染 | v2 3D reconstruction 三维重建 neural rendering 神经渲染 autonomous systems 自动驾驶 |
Input: Sparse-view videos 稀疏视图视频 Step1: Learning pose-dependent geometry and texture 学习与姿态相关的几何和纹理 Step2: Physically-based rendering and relighting 物理基础渲染与重光照 Output: Relightable human avatars 可重光照的人类化身 |
| 8.5 | [8.5] 2503.18421 4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video [{'name': 'Qiang Hu, Zihan Zheng, Houqiang Zhong, Sihua Fu, Li Song, XiaoyunZhang, Guangtao Zhai, Yanfeng Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D Gaussian compression Free-Viewpoint Video motion-aware representation |
Input: Free-Viewpoint Video (FVV) sequences 自由视角视频序列 Step1: Establish dynamic Gaussian representation 建立动态高斯表示 Step2: Integrate motion-aware encoding 结合运动感知编码 Step3: Optimize rate-distortion trade-off 优化速率-失真权衡 Output: Compressed and rendered FVV 经过压缩和渲染的FVV |
| 8.5 | [8.5] 2503.18470 MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse [{'name': 'Zhenyu Pan, Han Liu'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D spatial reasoning reinforcement learning vision-language models scene generation |
Input: Room image, user preferences, and object status 房间图像、用户偏好和物体状态 Step1: Generate a reasoning trace alongside a JSON-formatted layout 生成推理跟踪和JSON格式布局 Step2: Evaluate layout using reward signals 通过奖励信号评估布局 Step3: Optimize spatial structures through reinforcement learning 通过强化学习优化空间结构 Output: Enhanced 3D scene generation 改进的三维场景生成 |
| 8.5 | [8.5] 2503.18513 LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene [{'name': 'Xiaoyu Zhang, Weihong Pan, Chong Bao, Xiyu Zhang, Xiaojun Xiang, Hanqing Jiang, Hujun Bao'}] |
3D Rendering 三维渲染 | v2 Neural Radiance Fields 3D rendering view synthesis frequency analysis autonomous systems |
Input: Scenes with varying frequency details 场景含有变化的频率细节 Step1: 3D frequency quantification 进行3D频率量化 Step2: Frequency-aware rendering 实现频率感知渲染 Step3: Model evaluation and comparison with baselines 评估模型并与基准进行比较 Output: High-fidelity view synthesis outputs 高保真视图合成输出 |
| 8.5 | [8.5] 2503.18540 HiRes-FusedMIM: A High-Resolution RGB-DSM Pre-trained Model for Building-Level Remote Sensing Applications [{'name': 'Guneet Mutreja, Philipp Schuegraf, Ksenia Bittner'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction digital surface models building analysis remote sensing |
Input: High-resolution RGB and DSM data 高分辨率RGB和DSM数据 Step1: Data curation and pairing 数据策划和配对 Step2: Dual-encoder architecture development 双编码器架构开发 Step3: Joint representation learning 联合表示学习 Step4: Comprehensive evaluation on downstream tasks 全面评估下游任务 Output: Improved performance on building-level analysis 改进的建筑水平分析性能 |
| 8.5 | [8.5] 2503.18541 UniPCGC: Towards Practical Point Cloud Geometry Compression via an Efficient Unified Approach [{'name': 'Kangli Wang, Wei Gao'}] |
Point Cloud Processing 点云处理 | v2 point cloud compression 3D reconstruction efficiency variable rate |
Input: Point cloud data 点云数据 Step1: Implement Uneven 8-Stage Lossless Coder (UELC) 在无损模式下实施不均匀8阶段无损编码器 (UELC) Step2: Apply Variable Rate and Complexity Module (VRCM) 在有损模式下应用变量速率和复杂性模块 (VRCM) Step3: Combine UELC and VRCM 动态组合UELC和VRCM Output: Compressed point cloud representations 压缩点云表示 |
| 8.5 | [8.5] 2503.18544 Distilling Stereo Networks for Performant and Efficient Leaner Networks [{'name': 'Rafia Rahim, Samuel Woerz, Andreas Zell'}] |
Multi-view Stereo 多视角立体 | v2 Knowledge Distillation Stereo Matching Depth Estimation |
Input: Stereo image pairs 立体图像对 Step1: Design of student network 学生网络设计 Step2: Knowledge distillation pipeline knowledge knowledge distillation流水线 Step3: Evaluation of network performance 网络性能评估 Output: Leaner and faster student networks 精简且快速的学生网络 |
| 8.5 | [8.5] 2503.18631 Robust Lane Detection with Wavelet-Enhanced Context Modeling and Adaptive Sampling [{'name': 'Kunyang Li, Ming Hou'}] |
Autonomous Driving 自动驾驶 | v2 lane detection autonomous driving |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Algorithm development 算法开发 Step3: Model evaluation 模型评估 Output: Enhanced 3D models 改进的三维模型 |
| 8.5 | [8.5] 2503.18673 Any6D: Model-free 6D Pose Estimation of Novel Objects [{'name': 'Taeyeop Lee, Bowen Wen, Minjun Kang, Gyuree Kang, In So Kweon, Kuk-Jin Yoon'}] |
3D Reconstruction and Modeling 三维重建 | v2 6D pose estimation object detection computer vision |
Input: Single RGB-D anchor image 单个RGB-D锚图像 Step1: Joint object alignment process 物体对齐处理 Step2: Render-and-compare strategy 渲染与比较策略 Step3: Pose hypothesis generation 生成目标假设 Output: Accurate 6D pose and size estimation 准确的6D姿势和尺寸估计 |
| 8.5 | [8.5] 2503.18711 Accenture-NVS1: A Novel View Synthesis Dataset [{'name': "Thomas Sugg, Kyle O'Brien, Lekh Poudel, Alex Dumouchelle, Michelle Jou, Marc Bosch, Deva Ramanan, Srinivasa Narasimhan, Shubham Tulsiani"}] |
Novel View Synthesis 新颖视图合成 | v2 novel view synthesis 3D reconstruction multi-view scenes |
Input: Multi-view images 多视角图像 Step1: Data collection 数据收集 Step2: Calibration and geolocation 校准与地理定位 Step3: Dataset integration 数据集整合 Output: ACC-NVS1 dataset ACC-NVS1 数据集 |
| 8.5 | [8.5] 2503.18718 GS-Marker: Generalizable and Robust Watermarking for 3D Gaussian Splatting [{'name': 'Lijiang Li, Jinglu Wang, Xiang Ming, Yan Lu'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting watermarking 3D models |
Input: 3D Gaussian models 3D高斯模型 Step1: Message embedding 消息嵌入 Step2: Distortion enhancement 扭曲增强 Step3: Watermark extraction 水印提取 Output: Robust watermarked models 可靠水印模型 |
| 8.5 | [8.5] 2503.18725 FG$^2$: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching [{'name': 'Zimin Xia, Alexandre Alahi'}] |
3D Localization and Mapping 3D定位与地图构建 | v2 3D localization fine-grained feature matching autonomous navigation |
Input: Ground-level image and aerial image 地面图像与航空图像 Step1: Map ground image features to 3D point cloud 将地面图像特征映射到3D点云 Step2: Select features along height dimension along 选择高度维度的特征 Step3: Compute point correspondences using Procrustes alignment 使用Procrustes对齐计算点对应关系 Output: Estimated 3 Degrees of Freedom pose of the ground image 估计地面图像的3个自由度姿态 |
| 8.5 | [8.5] 2503.18767 Good Keypoints for the Two-View Geometry Estimation Problem [{'name': 'Konstantin Pakulev, Alexander Vakhitov, Gonzalo Ferrer'}] |
Visual Odometry 视觉里程计 | v2 keypoint detection 关键点检测 homography estimation 单应性估计 structure from motion 运动结构估计 visual SLAM 视觉SLAM |
Input: Image pairs for keypoint detection 图像对用于关键点检测 Step1: Develop a theoretical model for keypoint scoring 建立关键点评分的理论模型 Step2: Identify properties of good keypoints 确定良好关键点的特性 Step3: Design and implement the BoNeSS-ST keypoint detector 设计并实现BoNeSS-ST关键点检测器 Output: Enhanced keypoint performance 改进的关键点表现 |
| 8.5 | [8.5] 2503.18853 3DSwapping: Texture Swapping For 3D Object From Single Reference Image [{'name': 'Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, Robby T. Tan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D texture swapping view consistency gradient guidance |
Input: Single reference image 单个参考图像 Step1: Progressive generation 逐步生成 Step2: View-consistency gradient guidance 视图一致性梯度引导 Step3: Prompt-tuning based guidance 提示调优引导 Output: High-fidelity texture swaps 高保真纹理交换 |
| 8.5 | [8.5] 2503.18903 Building Blocks for Robust and Effective Semi-Supervised Real-World Object Detection [{'name': 'Moussa Kassem Sbeyti, Nadja Klein, Azarm Nowzad, Fikret Sivrikaya, Sahin Albayrak'}] |
Object Detection 目标检测 | v2 semi-supervised object detection autonomous driving pseudo-labeling |
Input: Real-world datasets with labeled and unlabeled data 真实世界数据集,含标记和未标记数据 Step1: Identify challenges in SSOD under real conditions 确定真实条件下的半监督目标检测中的挑战 Step2: Propose building blocks for performance improvement 提出性能改进的构建模块 Step3: Validate the methods through experiments on autonomous driving datasets 通过在自动驾驶数据集上的实验验证方法 Output: Enhanced semi-supervised object detection performance 改进的半监督目标检测性能 |
| 8.5 | [8.5] 2503.18933 SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction [{'name': 'Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li, Olga Zatsarynna, Juergen Gall'}] |
Image and Video Generation 图像生成与视频生成 | v2 video prediction multi-modal depth RGB |
Input: Past video frames 过去的视频帧 Step1: Modality integration 模态集成 Step2: Multi-modal video prediction 多模态视频预测 Step3: Performance evaluation 性能评估 Output: Future video frames 未来的视频帧 |
| 8.5 | [8.5] 2503.18950 Target-Aware Video Diffusion Models [{'name': 'Taeksoo Kim, Hanbyul Joo'}] |
Image and Video Generation 图像生成与视频生成 | v2 video generation human-object interaction robotics 3D motion synthesis |
Input: An input image and segmentation mask to indicate the target Step1: Extend a baseline image-to-video diffusion model to incorporate the target mask Step2: Introduce a special token to describe the target in the text prompt Step3: Fine-tune the model using a novel cross-attention loss Output: Generated video of actor interacting with the specified target |
| 8.0 | [8.0] 2503.18556 Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models [{'name': 'Bin Li, Dehong Gao, Yeyuan Wang, Linbo Jin, Shanqing Yu, Xiaoyan Cai, Libin Yang'}] |
VLM & VLA 视觉语言模型与对齐 | v2 Large Vision-Language Models hallucinations contrastive decoding |
Input: Image tokens 图像标记 Step1: Attention calculation 注意力计算 Step2: Instruction-based adjustment 基于指令的调整 Step3: Contrastive decoding 对比解码 Output: Adjusted logits 调整后的逻辑值 |
| 7.5 | [7.5] 2503.17700 MAMAT: 3D Mamba-Based Atmospheric Turbulence Removal and its Object Detection Capability [{'name': 'Paul Hill, Zhiming Liu, Nantheera Anantrasirichai'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D convolutions atmospheric turbulence object detection |
Input: Video sequences affected by atmospheric turbulence 受大气湍流影响的视频序列 Step1: Non-rigid registration using deformable 3D convolutions 采用可变形3D卷积进行非刚性配准 Step2: Contrast and detail enhancement using 3D Mamba architecture 采用3D Mamba架构进行对比度和细节增强 Output: Enhanced video with improved object detection capabilities 提升视频质量并改善物体检测能力 |
| 7.5 | [7.5] 2503.18278 TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model [{'name': 'Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Chendi Li, Jinghua Yan, Yu Bai, Ponnuswamy Sadayappan, Xia Hu, Bo Yuan'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 token pruning Vision-Language Models optimization |
Input: Visual tokens 视觉标记 Step1: Token selection 选择标记 Step2: Optimization formulation 优化公式化 Step3: Pruning execution 修剪执行 Output: Reduced token set 减少的标记集 |
| 7.5 | [7.5] 2503.18623 Training-Free Personalization via Retrieval and Reasoning on Fingerprints [{'name': 'Deepayan Das, Davide Talon, Yiming Wang, Massimiliano Mancini, Elisa Ricci'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models personalization multimodal reasoning |
Input: Pre-trained Vision-Language Models (VLMs) 预训练视觉语言模型 Step 1: Extract concept fingerprints 提取概念指纹 Step 2: Retrieve similar fingerprints from the database 检索相似的指纹 Step 3: Validate scores through cross-modal verification 验证得分通过跨模态验证 Output: Personal concept identification 输出: 个人概念识别 |
Arxiv 2025-03-24
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.16591 UniK3D: Universal Camera Monocular 3D Estimation [{'name': 'Luigi Piccinelli, Christos Sakaridis, Mattia Segu, Yung-Hsu Yang, Siyuan Li, Wim Abbeloos, Luc Van Gool'}] |
3D Reconstruction and Modeling 三维重建 | v2 monocular 3D estimation 3D reconstruction |
Input: Single image from any camera type 任何类型的单幅图像 Step1: Spherical representation modeling 球面表示建模 Step2: Camera ray decomposition 相机光线分解 Step3: Metric 3D reconstruction metrics 计量3D重建评估 Output: Coherent 3D point cloud coherent 3D点云 |
| 9.5 | [9.5] 2503.16653 iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation [{'name': 'Hanxiao Wang, Biao Zhang, Weize Quan, Dong-Ming Yan, Peter Wonka'}] |
Mesh Reconstruction 网格重建 | v2 mesh generation 3D modeling transformer architecture attention mechanisms |
Input: Mesh representations 网格表示 Step1: Interleaving full and linear attention mechanisms 全部与线性注意机制交错 Step2: Hourglass architecture integration 入集成沙漏架构 Step3: Efficiency enhancements 效率增强 Output: High-quality 3D meshes 高质量三维网格 |
| 9.5 | [9.5] 2503.16707 Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [{'name': 'Jinlong Li, Cristiano Saltori, Fabio Poiesi, Nicu Sebe'}] |
3D Scene Understanding 3D场景理解 | v2 3D scene understanding 3D场景理解 vision-language models 视觉语言模型 uncertainty estimation 不确定性评估 |
Input: Multiple foundation models 多个基础模型 Step1: Feature embedding extraction 特征嵌入提取 Step2: Cross-modal knowledge distillation 跨模态知识蒸馏 Step3: Uncertainty estimation and harmonization 不确定性评估与协调 Output: Enhanced 3D scene understanding 强化的3D场景理解 |
| 9.5 | [9.5] 2503.16710 4D Gaussian Splatting SLAM [{'name': 'Yanyan Li, Youxu Fang, Zunjie Zhu, Kunyi Li, Yong Ding, Federico Tombari'}] |
3D Reconstruction and Modeling 三维重建 | v2 4D Gaussian Splatting SLAM camera localization dynamic scenes 3D reconstruction |
Input: Sequence of RGB-D images RGB-D图像序列 Step1: Generate motion masks 生成运动掩码 Step2: Classify Gaussian primitives into static and dynamic 静态和动态高斯原语分类 Step3: Model transformation fields with sparse control points and MLP 使用稀疏控制点和MLP建模变换场 Output: 4D Gaussian radiance fields 4D高斯辐射场 |
| 9.5 | [9.5] 2503.16776 OpenCity3D: What do Vision-Language Models know about Urban Environments? [{'name': 'Valentin Bieri, Marco Zamboni, Nicolas S. Blumer, Qingxuan Chen, Francis Engelmann'}] |
3D Scene Understanding 三维场景理解 | v2 Vision-Language Models 3D Reconstruction Urban Analytics |
Input: Aerial multi-view images from urban environments 城市环境的多视角航拍图像 Step1: Generate enriched point cloud from 3D reconstructions 从三维重建生成丰富的点云 Step2: Integrate vision-language models to query urban features 集成视觉语言模型以查询城市特征 Step3: Analyze socio-economic properties using language input 使用语言输入分析社会经济属性 Output: Insights into urban characteristics and analytics 对城市特征和分析的洞察 |
| 9.5 | [9.5] 2503.16811 Seg2Box: 3D Object Detection by Point-Wise Semantics Supervision [{'name': 'Maoji Zheng, Ziyu Xu, Qiming Xia, Hai Wu, Chenglu Wen, Cheng Wang'}] |
3D Object Detection 3D物体检测 | v2 3D object detection 3D物体检测 semantic segmentation 语义分割 LiDAR autonomous driving 自动驾驶 |
Input: Point cloud data 点云数据 Step1: Multi-Frame Multi-Scale Clustering (MFMS-C) for pseudo-label generation 多帧多尺度聚类生成伪标签 Step2: Semantic-Guiding Iterative-Mining Self-Training (SGIM-ST) for refining labels 语义引导的迭代挖掘自我训练 Output: Enhanced 3D object detection results 改进的三维物体检测结果 |
| 9.5 | [9.5] 2503.16822 RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos [{'name': 'Yuxin Yao, Zhi Deng, Junhui Hou'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian representation dynamic modeling novel view synthesis |
Input: Monocular videos of articulated objects 单目视频 Step1: Extract skeleton-aware nodes from 3D Gaussians 从三维高斯中提取关节感知节点 Step2: Simplify skeleton using geometric and semantic information 使用几何和语义信息简化骨骼 Step3: Bind skeleton to 3D Gaussian representation 绑定骨骼到三维高斯表示 Output: Skeleton-driven dynamic model 支驱动态模型 |
| 9.5 | [9.5] 2503.16924 Optimized Minimal 3D Gaussian Splatting [{'name': 'Joo Chan Lee, Jong Hwan Ko, Eunbyung Park'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting 3D rendering storage optimization |
Input: 3D scenes 3D场景 Step1: Minimize redundancy in Gaussians 最小化高斯冗余 Step2: Create compact attribute representation 创建紧凑属性表示 Step3: Implement sub-vector quantization 实施子向量量化 Output: Reduced storage with minimal Gaussians 减少存储需求的最小高斯 |
| 9.5 | [9.5] 2503.16964 DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery [{'name': 'Jiadong Tang, Yu Gao, Dianyi Yang, Liqi Yan, Yufeng Yue, Yi Yang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction drone imagery Gaussian Splatting dynamic scenes |
Input: Drone imagery 无人机图像 Step1: Data integration 数据集成 Step2: Masking and segmentation 伪影与分割 Step3: Gaussian splatting algorithm implementation 高斯点云算法实现 Step4: 3D model reconstruction 三维模型重建 Output: Robust 3D reconstruction of scenes 稳健的场景三维重建 |
| 9.5 | [9.5] 2503.16970 Distilling Monocular Foundation Model for Fine-grained Depth Completion [{'name': 'Yingping Liang, Yutao Hu, Wenqi Shao, Ying Fu'}] |
Depth Estimation 深度估计 | v2 Depth Completion 深度补全 Monocular Models 单目模型 Knowledge Distillation 知识蒸馏 |
Input: Sparse LiDAR inputs 稀疏LiDAR输入 Step1: Generate diverse training data 生成多样化训练数据 Step2: Distill geometric knowledge 提取几何知识 Step3: Fine-tune with SSI Loss fine-tune 采用SSI Loss进行微调 Output: Enhanced depth completion models 改进的深度补全模型 |
| 9.5 | [9.5] 2503.17032 TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting [{'name': 'Jianchuan Chen, Jingchuan Hu, Gaige Wang, Zhonghua Jiang, Tiansong Zhou, Zhiwen Chen, Chengfei Lv'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction augmented reality 3D Gaussian Splatting |
Input: Multi-view sequences 多视角序列 Step1: Creation of parametric template 创建参数模板 Step2: Pre-training of StyleUnet-based network 预训练StyleUnet网络 Step3: Baking deformations into MLP network 将变形转化为MLP网络 Output: Real-time rendering of avatars 实时渲染头像 |
| 9.5 | [9.5] 2503.17093 ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration [{'name': "Johan Edstedt, Andr\'e Mateus, Alberto Jaenal"}] |
3D Reconstruction 三维重建 | v2 3D Reconstruction SfM Point Cloud Registration |
Input: SfM point clouds (3D Maps) 输入: SfM点云 (三维地图) Step1: Estimation of joint reference frame 第一步: 估计联合参考框架 Step2: Point cloud registration point clouds for SfM registration 第二步: SfM 注册的点云注册 Step3: Neural refiner application 第三步: 应用神经修整器 Output: Merged and registered 3D maps 输出: 合并和注册的三维地图 |
| 9.5 | [9.5] 2503.17097 R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model [{'name': 'Boyuan Zheng, Shouyi Lu, Renbo Huang, Minqing Huang, Fan Lu, Wei Tian, Guirong Zhuo, Lu Xiong'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 4D radar point clouds super-resolution LiDAR autonomous driving |
Input: Paired raw radar and LiDAR point clouds 原始雷达和激光雷达点云对 Step1: Represent point clouds using voxel features 使用体素特征表示点云 Step2: Apply Latent Voxel Diffusion Model (LVDM) 应用潜在体素扩散模型 Step3: Utilize Latent Point Cloud Reconstruction (LPCR) to rebuild point clouds 使用潜在点云重建模块重建点云 Output: Dense LiDAR-like point clouds 输出:密集的激光雷达样点云 |
| 9.5 | [9.5] 2503.17106 GAA-TSO: Geometry-Aware Assisted Depth Completion for Transparent and Specular Objects [{'name': 'Yizhe Liu, Tong Jia, Da Cai, Hao Wang, Dongyue Chen'}] |
Depth Estimation 深度估计 | v2 depth completion 3D structural features |
Input: RGB-D input including depth and RGB images (输入: 包含深度和RGB图像的RGB-D输入) Step1: Extract 2D features from RGB-D data (步骤1: 从RGB-D数据中提取2D特征) Step2: Back-project depth to a point cloud for 3D feature extraction (步骤2: 将深度反投影到点云以提取3D特征) Step3: Use gated cross-modal fusion modules for integrating 2D and 3D features (步骤3: 使用门控跨模态融合模块整合2D和3D特征) Output: Enhanced depth estimation for transparent and specular objects (输出: 针对透明和高光物体的增强深度估计) |
| 9.5 | [9.5] 2503.17153 Enhancing Steering Estimation with Semantic-Aware GNNs [{'name': 'Fouad Makiyeh, Huy-Dung Nguyen, Patrick Chareyre, Ramin Hasani, Marc Blanchon, Daniela Rus'}] |
3D Reconstruction and Modeling 三维重建 | v2 steering estimation 3D spatial information autonomous driving Graph Neural Networks point clouds |
Input: Monocular images and LiDAR-based point clouds Step1: Estimate depth and semantic maps from 2D images using a unified model Step2: Generate pseudo-3D point clouds from estimated depth Step3: Integrate 3D point clouds with Graph Neural Network (GNN) and Recurrent Neural Network (RNN) for steering estimation Output: Enhanced steering predictions using spatial information |
| 9.5 | [9.5] 2503.17168 Hi-ALPS -- An Experimental Robustness Quantification of Six LiDAR-based Object Detection Systems for Autonomous Driving [{'name': 'Alexandra Arzberger, Ramin Tavakoli Kolagari'}] |
3D Object Detection 3D目标检测 | v2 LiDAR object detection autonomous driving robustness |
Input: LiDAR point cloud data LiDAR点云数据 Step1: Implement Hi-ALPS framework 实现Hi-ALPS框架 Step2: Evaluate robustness of object detection systems 评估目标检测系统的鲁棒性 Step3: Analyze perturbation effects on OD systems 分析对OD系统的扰动影响 Output: Robustness metrics for 3D object detection systems 3D目标检测系统的鲁棒性指标 |
| 9.5 | [9.5] 2503.17182 Radar-Guided Polynomial Fitting for Metric Depth Estimation [{'name': 'Patrick Rim, Hyoungseob Park, Vadim Ezhov, Jeffrey Moon, Alex Wong'}] |
Depth Estimation 深度估计 | v2 3D reconstruction depth estimation autonomous driving radar polynomial fitting |
Input: Radar data and monocular depth predictions 雷达数据与单目深度预测 Step1: Polynomial fitting of depth predictions 深度预测的多项式拟合 Step2: Adaptive adjustment of depth non-uniformly 适应性地对深度进行非均匀调整 Step3: Model training with monotonicity regularization 使用单调性正则化进行模型训练 Output: Metric depth maps and error metrics 精确度量的深度图和误差指标 |
| 9.5 | [9.5] 2503.17316 Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors [{'name': 'Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy, Lourdes Agapito, Jerome Revaud'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction depth completion multi-view stereo |
Input: Multi-view images, camera intrinsics, and depth inputs (RGB, intrinsics, poses) Step 1: Data integration by incorporating known camera and scene priors Step 2: Lightweight transformer-based model that allows for modality selection during training Step 3: Output pointmaps for relative pose estimation and high-resolution reconstruction |
| 9.2 | [9.2] 2503.16611 A Recipe for Generating 3D Worlds From a Single Image [{'name': 'Katja Schwarz, Denys Rozumnyi, Samuel Rota Bul\`o, Lorenzo Porzi, Peter Kontschieder'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction depth estimation image generation |
Input: Single image (2D panorama) 单幅图像(2D全景) Step1: Generate coherent panoramas using a diffusion model 生成连贯的全景图像,使用扩散模型 Step2: Lift panorama into 3D with a metric depth estimator 利用测量深度估计将全景提升到3D Step3: Inpaint unobserved regions using point clouds 使用点云对未观察区域进行修复 Output: Immersive 3D world 逼真的3D世界 |
| 9.2 | [9.2] 2503.17175 Which2comm: An Efficient Collaborative Perception Framework for 3D Object Detection [{'name': 'Duanrui Yu, Jing You, Xin Pei, Anqi Qu, Dingyu Wang, Shaocheng Jia'}] |
3D Object Detection 3D目标检测 | v2 3D object detection 3D目标检测 collaborative perception 协作感知 semantic detection boxes 语义检测框 |
Input: Multi-agent sparse features 多智能体稀疏特征 Step1: Feature extraction 特征提取 Step2: Temporal fusion 时序融合 Step3: Sparse decoding 稀疏解码 Output: 3D object detection boxes 3D目标检测框 |
| 9.0 | [9.0] 2503.16825 SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion [{'name': 'Xiyue Guo, Jiarui Hu, Junjie Hu, Hujun Bao, Guofeng Zhang'}] |
3D Semantic Scene Completion 3D语义场景补全 | v2 3D semantic scene completion satellite-ground fusion autonomous driving |
Input: Satellite and ground images 卫星和地面图像 Step1: Parallel encoding of satellite and ground views 卫星和地面视图的并行编码 Step2: Feature alignment and correction 特征对齐与修正 Step3: Adaptive fusion of multi-view features 多视角特征的自适应融合 Output: Completed 3D semantic scene 完成的3D语义场景 |
| 9.0 | [9.0] 2503.16979 Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting [{'name': 'Jinbo Yan, Rui Peng, Zhiyan Wang, Luyang Tang, Jiayu Yang, Jie Liang, Jiahao Wu, Ronggang Wang'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction dynamic scene reconstruction Gaussian splatting |
Input: Multi-view images 多视角图像 Step1: Generalized Anchor-driven Gaussian Motion Network 引入通用锚点驱动高斯运动网络 Step2: Key-frame-guided Streaming Strategy 关键帧引导流媒体策略 Step3: Real-time evaluation 实时评估 Output: Streamed dynamic scene reconstruction 流媒体动态场景重建 |
| 8.5 | [8.5] 2503.16535 Vision-Language Embodiment for Monocular Depth Estimation [{'name': 'Jinchang Zhang, Guoyu Lu'}] |
Depth Estimation 深度估计 | v2 depth estimation monocular robotic perception |
Input: RGB images and camera intrinsic properties RGB图像和相机内在特性 Step1: Calculate embodied scene depth 计算具体现场深度 Step2: Integrate depth with image features 深度与图像特征集成 Step3: Use language priors for scene understanding 利用语言先验进行场景理解 Output: Enhanced depth estimations 改进的深度估计 |
| 8.5 | [8.5] 2503.16538 Leveraging Vision-Language Models for Open-Vocabulary Instance Segmentation and Tracking [{'name': 'Bastian P\"atzold, Jan Nogga, Sven Behnke'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Instance Segmentation Open-Vocabulary Detection Robotics |
Input: Structured descriptions from vision-language models (VLMs) 视觉语言模型生成的结构化描述 Step1: Identify visible object instances 识别可见物体实例 Step2: Inform open-vocabulary detector 通知开放词汇探测器 Step3: Extract bounding boxes 提取边界框 Step4: Process image streams in real time 以实时处理图像流 Output: Segmentation masks and tracking capabilities 分割掩码和跟踪能力 |
| 8.5 | [8.5] 2503.16579 World Knowledge from AI Image Generation for Robot Control [{'name': 'Jonas Krumme, Christoph Zetzsche'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Generative AI Image Generation Robot Control Implicit Knowledge |
Input: Images generated by AI 由AI生成的图像 Step1: Analyze world knowledge 分析世界知识 Step2: Apply knowledge to robot tasks 将知识应用于机器人任务 Step3: Generate contextually relevant images 生成上下文相关的图像 Output: Enhanced robot task performance 提高机器人的任务表现 |
| 8.5 | [8.5] 2503.16709 QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge [{'name': 'Xuan Shen, Weize Ma, Jing Liu, Changdi Yang, Rui Ding, Quanyi Wang, Henghui Ding, Wei Niu, Yanzhi Wang, Pu Zhao, Jun Lin, Jiuxiang Gu'}] |
Depth Estimation 深度估计 | v2 Monocular Depth Estimation Post-Training Quantization 3D Reconstruction |
Input: Monocular images 单目图像 Step1: Analyze outlier distribution 分析异常分布 Step2: Apply LogNP polishing optimization 应用LogNP平滑优化 Step3: Update weights for activation compensation 更新权重以补偿激活 Step4: Perform weight quantization with reconstruction 进行带重构的权重量化 Output: Efficient depth estimation model 高效的深度估计模型 |
| 8.5 | [8.5] 2503.16742 Digitally Prototype Your Eye Tracker: Simulating Hardware Performance using 3D Synthetic Data [{'name': 'Esther Y. H. Lin, Yimin Ding, Jogendra Kundu, Yatong An, Mohamed T. El-Haddad, Alexander Fix'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D synthetic data eye tracking NeRF hardware evaluation augmented reality |
Input: Real 3D eyes data from light dome captures Step1: Create a hybrid mesh-NeRF representation for eye modeling Step2: Develop an optical simulator for camera effects Step3: Synthesize novel viewpoints and evaluate performance Output: Enhanced predictions of eye tracker hardware performance |
| 8.5 | [8.5] 2503.16910 Salient Object Detection in Traffic Scene through the TSOD10K Dataset [{'name': 'Yu Qiu, Yuhang Sun, Jie Mei, Lin Xiao, Jing Xu'}] |
Autonomous Systems and Robotics 自动驾驶系统与机器人 | v2 salient object detection traffic scenes TSOD10K |
Input: Traffic images 交通图像 Step1: Data collection 数据收集 Step2: Dataset creation 数据集创建 Step3: Model development 模型开发 Step4: Evaluation of models 模型评估 Output: Traffic salient object detection results 交通显著性对象检测结果 |
| 8.5 | [8.5] 2503.16976 GeoT: Geometry-guided Instance-dependent Transition Matrix for Semi-supervised Tooth Point Cloud Segmentation [{'name': 'Weihao Yu, Xiaoqing Guo, Chenxin Li, Yifan Liu, Yixuan Yuan'}] |
Point Cloud Processing 点云处理 | v2 3D segmentation tooth point clouds semi-supervised learning |
Input: Intra-oral scans 口腔内扫描 Step1: Introduce geometric priors 引入几何先验 Step2: Estimate instance-dependent transition matrix (IDTM) 估计实例相关转移矩阵 Step3: Perform segmentation segmentation 执行分割 Output: Segmented tooth point clouds 分割的牙齿点云 |
| 8.5 | [8.5] 2503.17044 ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail [{'name': 'Chandan Yeshwanth, David Rozenberszki, Angela Dai'}] |
3D Scene Understanding 三维场景理解 | v2 3D captioning 3D scene understanding Vision-Language Models |
Input: 3D scene scans 3D场景扫描 Step1: Object detection and 3D understanding 对象检测与3D理解 Step2: Multi-level caption generation 多级描述生成 Output: Object- and part-level detailed captions 对象和部分级别的详细描述 |
| 8.5 | [8.5] 2503.17122 R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception [{'name': 'Jonas Mirlach, Lei Wan, Andreas Wiedholz, Hannan Ejaz Keen, Andreas Eich'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 LiDAR thermal imaging autonomous driving Vulnerable Road Users |
Input: Multi-modal sensors (LiDAR, RGB, and thermal) 多模态传感器 (激光雷达、RGB和热成像) Step1: Data collection 数据收集 Step2: Annotation and alignment 标注与对齐 Step3: Dataset release 数据集发布 Output: R-LiViT dataset R-LiViT 数据集 |
| 8.5 | [8.5] 2503.17197 FreeUV: Ground-Truth-Free Realistic Facial UV Texture Recovery via Cross-Assembly Inference Strategy [{'name': 'Xingchao Yang, Takafumi Taketomi, Yuki Endo, Yoshihiro Kanamori'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D texture recovery facial UV textures |
Input: Single-view 2D images 单视角二维图像 Step1: Appearance feature extraction 外观特征提取 Step2: Structural consistency training 结构一致性训练 Step3: Cross-Assembly inference integration 交叉组装推理集成 Output: Realistic 3D facial UV textures 逼真的三维面部UV纹理 |
| 8.5 | [8.5] 2503.17352 OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement [{'name': 'Yihe Deng, Hritik Bansal, Fan Yin, Nanyun Peng, Wei Wang, Kai-Wei Chang'}] |
Vision-Language Models 视觉语言模型 | v2 vision-language models reasoning capabilities reinforcement learning |
Input: Large vision-language models (LVLMs) 大型视觉语言模型 Step1: Distill reasoning capabilities from text models 从文本模型中提取推理能力 Step2: Generate reasoning steps using image captions 使用图像说明生成推理步骤 Step3: Utilize supervised fine-tuning (SFT) for initial training 利用监督微调进行初始训练 Step4: Apply reinforcement learning (RL) for iterative improvement 应用强化学习进行迭代改进 Output: Improved LVLM with enhanced reasoning capabilities 改进的LVLM,具有增强的推理能力 |
| 8.5 | [8.5] 2503.17358 Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image [{'name': 'Jerred Chen, Ronald Clark'}] |
Visual Odometry 视觉里程计 | v2 camera motion estimation visual odometry motion blur single image |
Input: Single motion-blurred image 单张运动模糊图像 Step1: Predict motion flow field and monocular depth map 预测运动流场和单目深度图 Step2: Solve linear least squares problem 解决线性最小二乘问题 Output: Instantaneous camera velocity estimate 瞬时相机速度估计 |
| 7.5 | [7.5] 2503.17142 Not Only Text: Exploring Compositionality of Visual Representations in Vision-Language Models [{'name': 'Davide Berasi, Matteo Farina, Massimiliano Mancini, Elisa Ricci, Nicola Strisciuglio'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models compositionality visual embeddings image generation |
Input: Pre-trained VLMs input visual embeddings 预训练的视觉语言模型输入视觉嵌入 Step1: Analyze visual compositionality 分析视觉组成性 Step2: Develop Geodesically Decomposable Embeddings (GDE) 开发几何可分解嵌入 Step3: Evaluate on compositional classification and group robustness 在组合分类和组鲁棒性上评估 Output: Enhanced understanding of visual embeddings 提升视觉嵌入理解 |
Arxiv 2025-03-21
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.15671 CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image [{'name': 'Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesha Choudhuri, Terrence Chen, Amit K. Roy-Chowdhury, Ziyan Wu'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction occlusion management human modeling |
Input: Single occluded image 单个被遮挡的图像 Step1: Generate occlusion-free views 生成无遮挡视图 Step2: Apply multiview diffusion model 应用多视角扩散模型 Step3: Predict 3D Gaussians 预测3D高斯 Output: Cohesive 3D representation 连续的3D表示 |
| 9.5 | [9.5] 2503.15672 GASP: Unifying Geometric and Semantic Self-Supervised Pre-training for Autonomous Driving [{'name': 'William Ljungbergh, Adam Lilja, Adam Tonderski. Arvid Laveno Ling, Carl Lindstr\"om, Willem Verbeke, Junsheng Fu, Christoffer Petersson, Lars Hammarstrand, Michael Felsberg'}] |
Autonomous Driving 自动驾驶 | v2 self-supervised learning occupancy prediction autonomous driving |
Input: Future lidar scans, camera images, and ego poses 未来激光雷达扫描、相机图像和自我姿态 Step1: Model geometric and semantic occupancy prediction 模型几何和语义占用预测 Step2: Learn unified representation 学习统一表示 Step3: Validate on autonomous driving benchmarks 在自动驾驶基准上验证 Output: Structured, generalizable representation of the environment 结构化、可泛化的环境表示 |
| 9.5 | [9.5] 2503.15712 SPNeRF: Open Vocabulary 3D Neural Scene Segmentation with Superpoints [{'name': 'Weiwen Hu, Niccol\`o Parodi, Marcus Zepp, Ingo Feldmann, Oliver Schreer, Peter Eisert'}] |
3D Segmentation 3D 分割 | v2 3D segmentation 3D 分割 Neural Radiance Fields 神经辐射场 geometric primitives 几何原语 CLIP |
Input: 3D scenes with CLIP features 处理含有 CLIP 特征的 3D 场景 Step1: Integrate geometric primitives into NeRF 在 NeRF 中整合几何原语 Step2: Generate primitive-wise CLIP features 生成原语级 CLIP 特征 Step3: Apply primitive-based merging with affinity scoring 使用具有亲和力评分的原语合并 Output: Improved 3D segmentation results 改进的 3D 分割结果 |
| 9.5 | [9.5] 2503.15742 Uncertainty-Aware Diffusion Guided Refinement of 3D Scenes [{'name': 'Sarosij Bose, Arindam Dutta, Sayak Nag, Junge Zhang, Jiachen Li, Konstantinos Karydis, Amit K. Roy Chowdhury'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction view synthesis uncertainty quantification |
Input: Single RGB image 单一RGB图像 Step1: Gaussian parameter optimization 高斯参数优化 Step2: Iterative refinement 迭代精炼 Step3: Scene rendering 场景渲染 Output: Enhanced 3D scene 改进的3D场景 |
| 9.5 | [9.5] 2503.15763 OffsetOPT: Explicit Surface Reconstruction without Normals [{'name': 'Huan Lei'}] |
3D Reconstruction and Modeling 三维重建 | v2 surface reconstruction 3D point clouds neural networks geometry processing |
Input: 3D point clouds 三维点云 Step1: Train a neural network to predict surface triangles 训练神经网络以预测表面三角形 Step2: Optimize per-point offsets to improve triangle predictions 优化每个点的偏移以提高三角形预测 Output: Reconstructed explicit surfaces 还原的显式表面 |
| 9.5 | [9.5] 2503.15835 BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting [{'name': 'Yiren Lu, Yunlai Zhou, Disheng Liu, Tuo Liang, Yu Yin'}] |
3D Reconstruction and Modeling 三维重建 | v2 dynamic scene reconstruction 3D Gaussian Splatting motion blur camera motion object motion |
Input: Blurry images with dynamic scenes 含动态场景的模糊图像 Step1: Camera motion deblurring 相机运动去模糊 Step2: Object motion deblurring 物体运动去模糊 Step3: Image alignment with sharp inputs 与清晰输入图像对齐 Output: High-quality dynamic scene reconstructions 高质量动态场景重建 |
| 9.5 | [9.5] 2503.15855 VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling [{'name': 'Hyojun Go, Byeongjun Park, Hyelin Nam, Byung-Hoon Kim, Hyungjin Chung, Changick Kim'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting text-to-3D generation multi-view images |
Input: Text prompts 文本提示 Step1: Dual-stream architecture development 双流架构开发 Step2: Joint modeling of multi-view images and camera poses 多视角图像和相机姿态的联合建模 Step3: Asynchronous sampling strategy implementation 异步采样策略实施 Output: Realistic 3D Gaussian splats 真实的3D高斯点云 |
| 9.5 | [9.5] 2503.15897 Learning 3D Scene Analogies with Neural Contextual Scene Maps [{'name': 'Junho Kim, Gwangtak Bae, Eun Sun Lee, Young Min Kim'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 3D scene analogy neural contextual scene maps trajectory transfer object placement |
Input: 3D scenes with regions having spatial relationships 3D场景与空间关系区域 Step1: Extract descriptor fields from scenes 从场景中提取描述符字段 Step2: Align descriptor fields using smooth maps 使用平滑映射对齐描述符字段 Step3: Estimate dense mappings between scene regions 估计场景区域之间的密集映射 Output: Neural contextual scene maps neural contextual scene maps |
| 9.5 | [9.5] 2503.15898 Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions [{'name': 'Boran Wen, Dingbang Huang, Zichen Zhang, Jiahong Zhou, Jianbin Deng, Jingyu Gong, Yulong Chen, Lizhuang Ma, Yong-Lu Li'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction human-object interaction autonomous systems |
Input: Single images 单幅图像 Step1: Data acquisition 数据获取 Step2: 3D annotation pipeline development 3D注释管道开发 Step3: Use Gaussian-HOI optimizer 高斯HOI优化器 Output: Open-vocabulary 3D HOI dataset 开放词汇3D HOI数据集 |
| 9.5 | [9.5] 2503.15908 Enhancing Close-up Novel View Synthesis via Pseudo-labeling [{'name': 'Jiatong Xia, Libo Sun, Lingqiao Liu'}] |
Neural Rendering 神经渲染 | v2 novel view synthesis pseudo-labeling Neural Radiance Fields close-up views |
Input: Training images with distant viewpoints 远处视角的训练图像 Step1: Generate virtual close-up viewpoints 生成虚拟近距离视角 Step2: Create wrapped images from original training images 根据原始训练图像创建包装图像 Step3: Evaluate consistency and occlusion for pseudo-training data 评估伪训练数据的一致性和遮挡 Step4: Train radiance fields with the pseudo-training data 使用伪训练数据训练辐射场模型 Output: Enhanced rendering of close-up views 改进的近距离视角渲染 |
| 9.5 | [9.5] 2503.15917 Learning to Efficiently Adapt Foundation Models for Self-Supervised Endoscopic 3D Scene Reconstruction from Any Cameras [{'name': 'Beilei Cui, Long Bai, Mobarakol Islam, An Wang, Zhiqi Ma, Yiming Huang, Feng Li, Zhen Chen, Zhongliang Jiang, Nassir Navab, Hongliang Ren'}] |
3D Reconstruction 三维重建 | v2 3D scene reconstruction self-supervised learning depth estimation endoscopic surgery |
Input: Surgical videos from any cameras 从任意相机获取的手术视频 Step1: Efficient adaptation of foundation models 基础模型的高效适应 Step2: Simultaneous estimation of depth maps, poses, and camera parameters 同时估计深度图、姿态和相机参数 Step3: 3D scene reconstruction pipeline using estimated parameters 使用估计的参数进行三维场景重建 Output: Optimized 3D scene reconstruction 优化的三维场景重建 |
| 9.5 | [9.5] 2503.15975 Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation [{'name': 'Kendong Liu, Zhiyu Zhu, Hui Liu, Junhui Hou'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction image-to-3D generation diffusion models |
Input: Single images 单张图像 Step1: Edge consistency-based refinement 边缘一致性基于的改进 Step2: Score function regularization 分数函数正则化 Step3: Adversarial augmentation 对抗性增强 Output: High-quality 3D models 高质量三维模型 |
| 9.5 | [9.5] 2503.15997 Automating 3D Dataset Generation with Neural Radiance Fields [{'name': 'P. Schulz, T. Hempel, A. Al-Hamadi'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D dataset generation neural radiance fields pose estimation |
Input: 2D images of target objects 目标对象的2D图像 Step1: 3D model creation 3D模型创建 Step2: Dataset generation 数据集生成 Output: Annotated 3D datasets 带注释的3D数据集 |
| 9.5 | [9.5] 2503.16263 From Monocular Vision to Autonomous Action: Guiding Tumor Resection via 3D Reconstruction [{'name': "Ayberk Acar, Mariana Smith, Lidia Al-Zogbi, Tanner Watts, Fangjie Li, Hao Li, Nural Yilmaz, Paul Maria Scheikl, Jesse F. d'Almeida, Susheela Sharma, Lauren Branscombe, Tayfun Efe Ertop, Robert J. Webster III, Ipek Oguz, Alan Kuntz, Axel Krieger, Jie Ying Wu"}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction monocular vision tumor resection structure from motion |
Input: RGB images RGB图像 Step1: Data integration 数据集成 Step2: Algorithm evaluation 算法评估 Step3: Segmentation generation 分割生成 Output: Segmented point clouds with 3D reconstruction 带有三维重建的分割点云 |
| 9.5 | [9.5] 2503.16282 Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model [{'name': 'Zhaochong An, Guolei Sun, Yun Liu, Runjia Li, Junlin Han, Ender Konukoglu, Serge Belongie'}] |
3D point cloud segmentation 点云分割 | v2 3D point cloud segmentation Vision-Language Models few-shot learning |
Input: 3D point cloud data 3D点云数据 Step1: Pseudo-label selection 伪标签选择 Step2: Adaptive infilling strategy 自适应填充策略 Step3: Base mix strategy 基础混合策略 Output: Enhanced segmentation model 改进的分割模型 |
| 9.5 | [9.5] 2503.16302 Unleashing Vecset Diffusion Model for Fast Shape Generation [{'name': 'Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Fuyun Wang, Huiwen Shi, Xianghui Yang, Qinxiang Lin, Jinwei Huang, Yuhong Liu, Jie Jiang, Chunchao Guo, Xiangyu Yue'}] |
3D Generation 三维生成 | v2 3D shape generation 3D形状生成 diffusion models 扩散模型 VAE Variational Autoencoder 变分自编码器 |
Input: 3D shape data 3D形状数据 Step1: Analyze diffusion sampling 分析扩散采样 Step2: Implement FlashVDM framework 实现FlashVDM框架 Step3: Optimize VAE decoding 优化VAE解码 Output: High-speed 3D shape generation 高速三维形状生成 |
| 9.5 | [9.5] 2503.16318 Dynamic Point Maps: A Versatile Representation for Dynamic 3D Reconstruction [{'name': 'Edgar Sucar, Zihang Lai, Eldar Insafutdinov, Andrea Vedaldi'}] |
3D Reconstruction and Modeling 三维重建 | v2 Dynamic Point Maps 3D Reconstruction Video Depth Prediction |
Input: Pair of images 图像对 Step1: Define point maps 定义点图 Step2: Predict dynamic point maps 预测动态点图 Step3: Evaluate across benchmarks 在基准上评估 Output: Enhanced dynamic reconstruction 改进的动态重建 |
| 9.5 | [9.5] 2503.16338 Gaussian Graph Network: Learning Efficient and Generalizable Gaussian Representations from Multi-view Images [{'name': 'Shengjun Zhang, Xin Fei, Fangfu Liu, Haixu Song, Yueqi Duan'}] |
3D Reconstruction and Modeling 三维重建 | v2 Gaussian Graph Network multi-view images 3D Gaussian Splatting efficient representation novel view synthesis |
Input: Multi-view images 多视角图像 Step1: Construct Gaussian Graphs 建立高斯图 Step2: Message passing at Gaussian level 高斯级别的消息传递 Step3: Gaussian pooling aggregation 高斯池化聚合 Output: Efficient Gaussian representations 高效的高斯表示 |
| 9.5 | [9.5] 2503.16399 SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World [{'name': 'Chen Chen, Zhirui Wang, Taowei Sheng, Yi Jiang, Yundu Li, Peirui Cheng, Luning Zhang, Kaiqiang Chen, Yanfeng Hu, Xue Yang, Xian Sun'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D occupancy prediction satellite imagery autonomous driving |
Input: Historical satellite imagery and street-view images 历史卫星图像与街道视图图像 Step1: Data integration with GPS & IMU data 数据集成与 GPS 和 IMU 数据 Step2: Implement Dynamic-Decoupling Fusion for inconsistencies 进行动态解耦融合以解决不一致问题 Step3: Use 3D-Proj Guidance for feature extraction 使用 3D 投影引导进行特征提取 Step4: Apply Uniform Sampling Alignment for sampling adjustments 使用均匀采样对齐进行采样调整 Output: Enhanced 3D occupancy prediction model 输出: 改进的 3D 占用预测模型 |
| 9.5 | [9.5] 2503.16412 DreamTexture: Shape from Virtual Texture with Analysis by Augmentation [{'name': 'Ananta R. Bhattarai, Xingzhe He, Alla Sheffer, Helge Rhodin'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction monocular depth texture alignment |
Input: Monocular images 单眼图像 Step1: Texture alignment 纹理对齐 Step2: Depth reconstruction 深度重建 Step3: Texture optimization 纹理优化 Output: 3D object representation 3D对象表示 |
| 9.5 | [9.5] 2503.16413 M3: 3D-Spatial MultiModal Memory [{'name': 'Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng, Jianglong Ye, Sifei Liu, Xiaolong Wang'}] |
3D Spatial Memory 3D空间记忆 | v2 3D Gaussian Splatting multimodal memory autonomous systems |
Input: Scene video clips 场景视频片段 Step 1: Implement Gaussian splatting 技术实现高斯点云 Step 2: Integrate features from foundation models 集成基础模型特征 Step 3: Optimize memory structure 优化记忆结构 Output: Compressed multimodal memory compressed multi-modal memory |
| 9.5 | [9.5] 2503.16429 Sonata: Self-Supervised Learning of Reliable Point Representations [{'name': 'Xiaoyang Wu, Daniel DeTone, Duncan Frost, Tianwei Shen, Chris Xie, Nan Yang, Jakob Engel, Richard Newcombe, Hengshuang Zhao, Julian Straub'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D self-supervised learning point cloud representation quality |
Input: Point clouds 点云 Step1: Identify geometric shortcuts 识别几何捷径 Step2: Apply self-supervised learning techniques 应用自监督学习技术 Step3: Enhance representation quality 提升表示质量 Output: Reliable point representations 可靠的点表示 |
| 9.2 | [9.2] 2503.15667 DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis [{'name': 'Yuming Gu, Phong Tran, Yujian Zheng, Hongyi Xu, Heyuan Li, Adilbek Karmanov, Hao Li'}] |
Image Generation 图像生成 | v2 360-degree synthesis human head generation neural rendering |
Input: Single-view portrait images 单视角肖像图像 Step1: Generate back-of-head details using ControlNet 生成后脑勺细节 Step2: Dual appearance module ensures consistency 采用双重外观模块确保一致性 Step3: Train on continuous view sequences 训练于连续视图序列 Output: Generate 360-degree consistent head views 生成360度一致的头部视图 |
| 9.0 | [9.0] 2503.15666 Toward Scalable, Flexible Scene Flow for Point Clouds [{'name': 'Kyle Vedder'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 scene flow point clouds 3D motion estimation scalability |
Input: Temporally successive point cloud observations 时间上连续的点云观测 Step1: Contextualize scene flow and prior methods 上下文化场景流及其先前方法 Step2: Build scalable scene flow estimators 构建可扩展的场景流估计器 Step3: Introduce a benchmark for estimate quality 引入估计质量基准 Step4: Develop an unsupervised scene flow estimator 开发无监督场景流估计器 Output: Enhanced scene flow estimations 改进的场景流估计 |
| 9.0 | [9.0] 2503.15877 Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation [{'name': 'Tiange Xiang, Kai Li, Chengjiang Long, Christian H\"ane, Peihong Guo, Scott Delp, Ehsan Adeli, Li Fei-Fei'}] |
3D Generation 三维生成 | v2 3D generation diffusion models Gaussian fitting |
Input: Pre-trained 2D diffusion models 预训练的2D扩散模型 Step1: Create Gaussian Atlas from 3D objects 从3D对象创建高斯图 Step2: Fine-tune 2D models for 3D output 对2D模型进行微调以生成3D输出 Output: Generated 3D Gaussian structures 生成的3D高斯结构 |
| 9.0 | [9.0] 2503.16422 1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering [{'name': 'Yuheng Yuan, Qiuhong Shen, Xingyi Yang, Xinchao Wang'}] |
Dynamic Scene Rendering 动态场景渲染 | v2 4D Gaussian Splatting dynamic scene reconstruction real-time rendering |
Input: Dynamic scene data 动态场景数据 Step1: Analyze temporal redundancy 分析时间冗余 Step2: Implement 4DGS-1K framework 实施4DGS-1K框架 Step3: Prune short-lifespan Gaussians 修剪短暂生命周期的高斯 Step4: Filter inactive Gaussians 过滤非活动高斯 Output: Optimized scene representation 优化的场景表示 |
| 8.5 | [8.5] 2503.15676 High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight [{'name': "C\'edric Vincent, Taehyoung Kim, Henri Mee{\ss}"}] |
Autonomous Systems and Robotics 自动驾驶机器人 | v2 video semantic segmentation autonomous systems temporal consistency |
Input: Aerial video frames 航空视频帧 Step1: Semantic segmentation using image model 使用图像模型进行语义分割 Step2: Temporal prediction propagation 时间预测传播 Step3: Knowledge distillation for semi-supervised training 半监督训练的知识蒸馏 Output: Consistent and accurate segmentation predictions 一致和准确的分割预测 |
| 8.5 | [8.5] 2503.15778 AutoDrive-QA- Automated Generation of Multiple-Choice Questions for Autonomous Driving Datasets Using Large Vision-Language Models [{'name': 'Boshra Khalili, Andrew W. Smyth'}] |
Autonomous Systems and Robotics 自主系统与机器人 | v2 autonomous driving question answering vision-language models |
Input: Driving QA datasets 驾驶问答数据集 Step1: Data integration 数据集成 Step2: MCQ conversion methodology MCQ转换方法 Step3: Evaluation on public datasets 在公共数据集上评估 Output: Standardized evaluation framework 标准化评估框架 |
| 8.5 | [8.5] 2503.15818 Computation-Efficient and Recognition-Friendly 3D Point Cloud Privacy Protection [{'name': 'Haotian Ma, Lin Gu, Siyi Wu, Yingying Zhu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point cloud privacy protection flow-based generative model |
Input: 3D point cloud data 3D点云数据 Step1: Define the 3D point cloud privacy problem 定义3D点云隐私问题 Step2: Implement the PointFlowGMM framework 实现PointFlowGMM框架 Step3: Project point cloud into latent Gaussian mixture space 将点云投影到潜在高斯混合空间 Step4: Apply rotation for privacy protection 应用旋转以保护隐私 Output: Encrypted 3D point clouds with preserved classification capabilities 输出: 具有保留分类能力的加密3D点云 |
| 8.5 | [8.5] 2503.15875 MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving [{'name': 'Haiguang Wang, Daqi Liu, Hongwei Xie, Haisong Liu, Enhui Ma, Kaicheng Yu, Limin Wang, Bing Wang'}] |
Video Generation 视频生成 | v2 video generation autonomous driving world models |
Input: Multi-view video data 多视角视频数据 Step1: Generate high-fidelity long videos 生成高保真长时间视频 Step2: Stabilize video generation 稳定视频生成 Step3: Correct distortion of dynamic objects 修正动态物体的失真 Output: Long-duration coherent videos 长时段的一致视频 |
| 8.5 | [8.5] 2503.15905 Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation [{'name': 'Jiyuan Wang, Chunyu Lin, Cheng Guan, Lang Nie, Jing He, Haodong Li, Kang Liao, Yao Zhao'}] |
Depth Estimation 深度估计 | v2 Depth Estimation 深度估计 Self-supervised Learning 自监督学习 Stable Diffusion 稳定扩散 |
Input: Monocular images 单目图像 Step1: Hybrid image reconstruction construction 混合图像重建 Step2: Scale-Shift GRU development 比例-偏移GRU开发 Step3: Self-supervised depth estimation self-supervised depth estimation 自监督深度估计 Output: Accurate depth maps 精确的深度图 |
| 8.5 | [8.5] 2503.15910 No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather [{'name': 'Junsung Park, Hwijeong Lee, Inha Kang, Hyunjung Shim'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 LiDAR semantic segmentation autonomous driving adverse weather |
Input: LiDAR point cloud data Step1: Identify performance gaps in existing models Step2: Develop methods to bind point features to superclasses Step3: Define local regions for cleaning data Output: Improved predictions for 'things' categories |
| 8.5 | [8.5] 2503.16000 SenseExpo: Efficient Autonomous Exploration with Prediction Information from Lightweight Neural Networks [{'name': 'Haojia Gao, Haohua Que, Hoiian Au, Weihao Shan, Mingkai Liu, Yusen Qin, Lei Mu, Rong Zhao, Xinghua Yang, Qi Wei, Fei Qiao'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 autonomous exploration prediction network Generative Adversarial Networks (GANs) Robotics efficient frameworks |
Input: Partial observations captured by the robot's onboard sensors 通过机器人的传感器捕获的部分观测 Step1: Local map prediction 基于局部地图的预测 Step2: Model integration with GANs, Transformers, and FFC 用GAN、Transformer和FFC集成模型 Step3: Efficiency evaluation 评估效率 Output: Efficient autonomous exploration framework 高效的自主探索框架 |
| 8.5 | [8.5] 2503.16125 Uncertainty Meets Diversity: A Comprehensive Active Learning Framework for Indoor 3D Object Detection [{'name': 'Jiangyi Wang, Na Zhao'}] |
3D Object Detection 在室内环境中的应用 | v2 3D object detection active learning indoor environments uncertainty diversity |
Input: Indoor 3D object data 室内3D物体数据 Step1: Sample uncertainty assessment 样本不确定性评估 Step2: Diversity optimization 多样性优化 Step3: Active sample selection 主动样本选择 Output: Annotated samples for indoor 3D detection 为室内3D检测注释的样本 |
| 8.5 | [8.5] 2503.16289 SceneMI: Motion In-betweening for Modeling Human-Scene Interactions [{'name': 'Inwoo Hwang, Bing Zhou, Young Min Kim, Jian Wang, Chuan Guo'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction motion in-betweening human-scene interactions generative modeling |
Input: Noisy keyframes and scenes 噪声关键帧和场景 Step1: Scene encoding through dual descriptors 场景编码通过双重描述符 Step2: Dual scene context processing 双重场景上下文处理 Step3: Denoising and keyframe interpolation 去噪和关键帧插值 Output: Smooth motion transitions and scene reconstructions 平滑的运动过渡和场景重建 |
| 8.5 | [8.5] 2503.16378 Panoptic-CUDAL Technical Report: Rural Australia Point Cloud Dataset in Rainy Conditions [{'name': 'Tzu-Yun Tseng, Alexey Nekrasov, Malcolm Burdorf, Bastian Leibe, Julie Stephany Berrio, Mao Shan, Stewart Worrall'}] |
Autonomous Driving 自动驾驶 | v2 LiDAR autonomous driving dataset panoptic segmentation |
Input: Synchronized sensor data 同步传感器数据 Step1: Data collection 数据收集 Step2: Annotation of LiDAR and image data LiDAR 和图像数据的标注 Step3: Model evaluation and analysis 模型评估与分析 Output: Baseline results for segmentation methods 语义分割方法的基线结果 |
| 8.5 | [8.5] 2503.16396 SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation [{'name': 'Chun-Han Yao, Yiming Xie, Vikram Voleti, Huaizu Jiang, Varun Jampani'}] |
Image and Video Generation 图像生成与视频生成 | v2 3D asset generation multi-view video 4D generation video diffusion model |
Input: Monocular video 单目视频 Step1: Network architecture modification 网络架构修改 Step2: Data curation 数据整理 Step3: Progressive training strategy 逐步训练策略 Step4: 4D optimization 4D优化 Output: High-quality multi-view videos 高质量多视角视频 |
| 8.5 | [8.5] 2503.16420 SynCity: Training-Free Generation of 3D Worlds [{'name': 'Paul Engstler, Aleksandar Shtedritski, Iro Laina, Christian Rupprecht, Andrea Vedaldi'}] |
3D Generation 三维生成 | v2 3D generation textual descriptions tile-based generation |
Input: Textual descriptions 文本描述 Step1: Generate 3D tiles 生成3D瓦片 Step2: Stitch tiles together 拼接瓦片 Step3: Ensure geometric consistency 确保几何一致性 Output: Large and immersive 3D worlds 输出: 大型且沉浸式的3D世界 |
| 7.5 | [7.5] 2503.15886 Enhancing Zero-Shot Image Recognition in Vision-Language Models through Human-like Concept Guidance [{'name': 'Hui Liu, Wenya Wang, Kecheng Chen, Jie Liu, Yibing Liu, Tiexin Qin, Peisong He, Xinghao Jiang, Haoliang Li'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Zero-Shot Generalization Vision Language Models Concept-guided Reasoning |
Input: Zero-shot image recognition data 零样本图像识别数据 Step1: Concept modeling 概念建模 Step2: Importance sampling algorithm 重要性采样算法 Step3: Generate discriminative concepts 生成可区分的概念 Output: Enhanced zero-shot recognition results 改进的零样本识别结果 |
| 6.5 | [6.5] 2503.16365 JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse [{'name': 'Muyao Li, Zihao Wang, Kaichen He, Xiaojian Ma, Yitao Liang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision Language Models decision-making |
Input: Vision Language models 视觉语言模型 Step1: Visual Language Post-Training 视觉语言后训练 Step2: Action decision-making 行为决策 Output: Enhanced decision-making capabilities 改进的决策能力 |
| 6.5 | [6.5] 2503.16397 Scale-wise Distillation of Diffusion Models [{'name': 'Nikita Starodubcev, Denis Kuznedelev, Artem Babenko, Dmitry Baranchuk'}] |
Image Generation 图像生成 | v2 Diffusion Models Text-to-Image Generation Generative Models |
Input: Low-resolution data 低分辨率数据 Step1: Scale-wise generation 按比例生成 Step2: Distribution matching 分布匹配 Step3: Resolution upscaling 分辨率上升 Output: High-quality generated images 高质量生成图像 |
Arxiv 2025-03-19
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.13587 Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception [{'name': 'Dingkang Liang, Dingyuan Zhang, Xin Zhou, Sifan Tu, Tianrui Feng, Xiaofan Li, Yumeng Zhang, Mingyang Du, Xiao Tan, Xiang Bai'}] |
Unified World Model 统一世界模型 | v2 driving world model future prediction depth estimation autonomous driving |
Input: Current image 当前图像 Step1: Dual-Latent Sharing scheme 双潜在共享方案 Step2: Multi-scale Latent Interaction mechanism 多尺度潜在交互机制 Step3: Predict future image-depth pairs 预测未来图像-深度对 Output: Unified future predictions 统一的未来预测 |
| 9.5 | [9.5] 2503.13710 Improving Geometric Consistency for 360-Degree Neural Radiance Fields in Indoor Scenarios [{'name': 'Iryna Repinetska, Anna Hilsmann, Peter Eisert'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields 3D reconstruction depth estimation |
Input: 360-degree indoor images 360度室内图像 Step1: Dense depth priors calculation 密集深度先验计算 Step2: Novel depth loss function formulation 新的深度损失函数设计 Step3: Patch-based depth regularization implementation 贴片深度正则化实施 Output: Improved rendering quality 提高渲染质量 |
| 9.5 | [9.5] 2503.13721 SED-MVS: Segmentation-Driven and Edge-Aligned Deformation Multi-View Stereo with Depth Restoration and Occlusion Constraint [{'name': 'Zhenlong Yuan, Zhidong Yang, Yujun Cai, Kuangxin Wu, Mufan Liu, Dapeng Zhang, Hao Jiang, Zhaoxin Li, Zhaoqi Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction 三维重建 Multi-View Stereo 多视角立体 occlusion-aware reconstruction 遮挡感知重建 |
Input: Multi-view images 多视角图像 Step1: Panoptic segmentation for depth edge guidance 采用全景分割作为深度边缘指导 Step2: Multi-trajectory diffusion strategy to align patches with depth edges 多轨迹扩散策略以确保补丁与深度边缘对齐 Step3: Combine sparse points and monocular depth map to restore reliable depth map 结合稀疏点和单目深度图以恢复可靠的深度图 Output: Accurate 3D reconstruction of the scene or object 输出: 场景或对象的准确三维重建 |
| 9.5 | [9.5] 2503.13739 Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes [{'name': 'Keqi Chen, Vinkle Srivastav, Didier Mutter, Nicolas Padoy'}] |
Multi-view Stereo 多视角立体 | v2 multi-view person association self-supervised learning geometric constraints |
Input: Multi-view RGB images 多视角RGB图像 Step1: Encoder-decoder model encoding 准编码解码模型编码 Step2: Self-supervised learning framework训练自监督学习框架 Step3: Synchronization task for image pairs image pairs image pair 的同步任务 Output: Geometric feature encoding 几何特征编码 |
| 9.5 | [9.5] 2503.13743 MonoCT: Overcoming Monocular 3D Detection Domain Shift with Consistent Teacher Models [{'name': 'Johannes Meier, Louis Inchingolo, Oussema Dhaouadi, Yan Xia, Jacques Kaiser, Daniel Cremers'}] |
3D Object Detection 3D目标检测 | v2 Monocular 3D detection 单目3D检测 Depth estimation 深度估计 Domain adaptation 域适应 |
Input: Monocular RGB images 单目RGB图像 Step1: Generalized Depth Enhancement (GDE) module development 开发广义深度增强(GDE)模块 Step2: Pseudo Label Scoring (PLS) module design 设计伪标签评分(PLS)模块 Step3: Extensive experiments on multiple benchmarks 在多个基准上进行广泛实验 Output: Improved monocular 3D detection performance 改进的单目3D检测性能 |
| 9.5 | [9.5] 2503.13816 MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments [{'name': 'Zhixuan Liu, Haokun Zhu, Rui Chen, Jonathan Francis, Soonmin Hwang, Ji Zhang, Jean Oh'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction privacy-preserving multi-view images depth images |
Input: Depth images only 仅深度图像 Step1: Multi-view overlapped scene alignment 多视角重叠场景对齐 Step2: Inference-time optimization 推断时优化 Step3: Generation of consistent RGB images 生成一致的RGB图像 Output: Privacy-preserving digital twins 保持隐私的数字双胞胎 |
| 9.5 | [9.5] 2503.13861 RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving [{'name': 'Yujin Wang, Quanfeng Liu, Zhengxin Jiang, Tianyi Wang, Junfeng Jiao, Hongqing Chu, Bingzhao Gao, Hong Chen'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 vision-language models autonomous driving decision-making spatial perception |
Input: Vision-language models for autonomous driving 视觉语言模型的自主驾驶 Step1: Embedding flow for scene encoding 场景编码的嵌入流 Step2: Retrieval flow to fetch relevant scenes 检索流获取相关场景 Step3: Generating flow to produce meta-actions 生成流生成元动作 Output: Decision-making enhancements for autonomous driving decisions 提升自主驾驶决策的决策能力 |
| 9.5 | [9.5] 2503.13914 PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds [{'name': 'Barza Nisar, Steven L. Waslander'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 3D semantic segmentation LiDAR point clouds self-supervised learning |
Input: LiDAR point clouds LiDAR点云 Step1: Define bounding box regression as pretext task 定义边界框回归作为预训练任务 Step2: Incorporate LiDAR beam pattern augmentation 融入激光雷达束模式增强 Step3: Train model using contrastive learning 采用对比学习训练模型 Output: Object pose and size-aware features 输出:物体姿态和尺寸感知特征 |
| 9.5 | [9.5] 2503.13948 Light4GS: Lightweight Compact 4D Gaussian Splatting Generation via Context Model [{'name': 'Mufan Liu, Qi Yang, He Huang, Wenjie Huang, Zhenlong Yuan, Zhu Li, Yiling Xu'}] |
3D Gaussian Splatting 3D高斯点云 | v2 4D Gaussian Splatting 3D Reconstruction Novel View Synthesis |
Input: Temporal deformation primitives 时间变形原语 Step1: Spatio-temporal significance pruning 空间-时间显著性修剪 Step2: Deep context model integration 深度上下文模型集成 Output: Compressed lightweight dynamic 3DGS 压缩轻量级动态3DGS |
| 9.5 | [9.5] 2503.14002 MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling [{'name': 'Damian Boborzi, Phillip Mueller, Jonas Emrich, Dominik Schmid, Sebastian Mueller, Lars Mikelsons'}] |
3D Generation 三维生成 | v2 3D reconstruction 3D dataset generative modeling vehicle models |
Input: 3D models from Objaverse-XL 来自Objaverse-XL的3D模型 Step1: Create a manually labeled subset 创建手动标记子集 Step2: Train a quality classifier 训练质量分类器 Step3: Apply automated filtering 应用自动化过滤 Output: High-quality filtered 3D vehicle dataset 输出:高质量过滤的3D车辆数据集 |
| 9.5 | [9.5] 2503.14029 Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting [{'name': 'Runsong Zhu, Shi Qiu, Zhengzhe Liu, Ka-Hei Hui, Qianyi Wu, Pheng-Ann Heng, Chi-Wing Fu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D segmentation Gaussian splatting computer vision |
Input: Multi-view 2D instance segmentation 2D实例分割 Step1: Gaussian-level feature augmentation 高斯级特征增强 Step2: Object-level codebook learning 对象级别的词汇表学习 Step3: Association learning 关联学习 Step4: Noisy label filtering 噪声标签过滤 Output: Accurate 3D scene segmentation 准确的3D场景分割 |
| 9.5 | [9.5] 2503.14198 RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images [{'name': 'Junjin Xiao, Qing Zhang, Yonewei Nie, Lei Zhu, Wei-Shi Zheng'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction novel view synthesis Gaussian splatting autonomous driving |
Input: Sparse multi-view images 稀疏多视角图像 Step1: Lift SMPL vertices to 3D points 提升SMPL顶点到3D点 Step2: Predict image-aligned 3D prior points 预测与图像对齐的3D先验点 Step3: Regress coarse and fine Gaussian parameters 回归粗糙和细粒度的高斯参数 Output: High-fidelity novel views 高保真新视图 |
| 9.5 | [9.5] 2503.14219 Segmentation-Guided Neural Radiance Fields for Novel Street View Synthesis [{'name': 'Yizhou Li, Yusuke Monno, Masatoshi Okutomi, Yuuichi Tanaka, Seiichi Kataoka, Teruaki Kosiba'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields 3D reconstruction novel view synthesis outdoor scenes |
Input: Monocular video clips captured by a video recorder mounted on a car. Step1: Segmentation mask generation using Grounded SAM. Step2:处理 transient objects by excluding them from training. Step3: Modeling the sky with a specialized representation. Step4: Regularizing the ground plane to conform to planar geometry. Step5: Adapting to inconsistent lighting through appearance embeddings. Output: Improved novel view synthesis quality with fewer artifacts. |
| 9.5 | [9.5] 2503.14274 Improving Adaptive Density Control for 3D Gaussian Splatting [{'name': 'Glenn Grubert, Florian Barthel, Anna Hilsmann, Peter Eisert'}] |
3D Gaussian Splatting 三维高斯点云 | v2 3D reconstruction Gaussian Splatting novel view synthesis |
Input: Multi-view images 多视角图像 Step1: Adaptive density control for Gaussian management 自适应密度控制以管理高斯 Step2: Implement exponential gradient thresholding 实施指数梯度阈值 Step3: Calculate corrected scene extent 计算纠正后的场景范围 Step4: Execute significance-aware pruning 执行重要性感知修剪 Output: Enhanced rendering quality 改进的渲染质量 |
| 9.5 | [9.5] 2503.14346 3D Densification for Multi-Map Monocular VSLAM in Endoscopy [{'name': "X. Anad\'on, Javier Rodr\'iguez-Puigvert, J. M. M. Montiel"}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction endoscopy visual SLAM CudaSIFT depth estimation |
Input: Monocular endoscopic sequences 单目内窥镜序列 Step1: Remove outliers 去除异常值 Step2: Densify maps 加密地图 Step3: Align predictions and submaps 对齐预测和子地图 Output: Reliable densified 3D maps 可靠的加密3D地图 |
| 9.5 | [9.5] 2503.14445 Bolt3D: Generating 3D Scenes in Seconds [{'name': 'Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan, Ruiqi Gao, Arthur Brussee, Aleksander Holynski, Ricardo Martin-Brualla, Jonathan T. Barron, Philipp Henzler'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene generation latent diffusion model multiview images |
Input: One or multiple images 任选入图像 Step1: Create large-scale multiview-consistent dataset 创建大规模多视角一致性数据集 Step2: Train latent diffusion model 训练潜在扩散模型 Step3: Generate 3D scene representation 生成三维场景表示 Output: Fast 3D scene representation generation 快速生成三维场景表示 |
| 9.5 | [9.5] 2503.14463 SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model [{'name': 'Yucheng Mao, Boyang Wang, Nilesh Kulkarni, Jeong Joon Park'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction image restoration multi-view diffusion model |
Input: Multi-view images 多视角图像 Step1: Jointly denoise multiple photographs 联合去噪多个影像 Step2: Implement a multi-view diffusion model 实施多视角扩散模型 Step3: Maintain 3D consistency 维护三维一致性 Output: Restored images with improved quality 修复后图像,质量提升 |
| 9.5 | [9.5] 2503.14483 Multi-view Reconstruction via SfM-guided Monocular Depth Estimation [{'name': 'Haoyu Guo, He Zhu, Sida Peng, Haotong Lin, Yunzhi Yan, Tao Xie, Wenguan Wang, Xiaowei Zhou, Hujun Bao'}] |
3D Reconstruction 三维重建 | v2 3D Reconstruction 三维重建 Monocular Depth Estimation 单目深度估计 SfM-guided Reconstruction SfM引导重建 Multi-view Geometry 多视角几何 |
Input: Multi-view images 多视角图像 Step1: Recover the SfM point cloud 恢复SfM点云 Step2: Inject SfM information into the diffusion model 将SfM信息注入扩散模型 Step3: Predict depth maps 预测深度图 Step4: Fuse depth maps for 3D reconstruction 进行深度图融合以实现3D重建 Output: High-quality 3D models 高质量的3D模型 |
| 9.2 | [9.2] 2503.13869 Robust3D-CIL: Robust Class-Incremental Learning for 3D Perception [{'name': 'Jinge Ma, Jiangpeng He, Fengqing Zhu'}] |
3D Perception 3D感知 | v2 3D perception class-incremental learning autonomous driving |
Input: 3D point cloud data 3D点云数据 Step1: Develop a robust 3D point cloud class-incremental learning framework 设计一个稳健的3D点云类增量学习框架 Step2: Implement an exemplar selection strategy based on Farthest Point Sampling 实施基于最远点采样的样本选择策略 Step3: Introduce a point cloud downsampling-based replay method 引入基于点云降采样的重放方法 Output: Improved adaptability and robustness in 3D perception models 输出: 提高3D感知模型的适应性和稳健性 |
| 9.2 | [9.2] 2503.13952 SimWorld: A Unified Benchmark for Simulator-Conditioned Scene Generation via World Model [{'name': 'Xinqing Li, Ruiqi Song, Qingyu Xie, Ye Wu, Nanxin Zeng, Yunfeng Ai'}] |
Autonomous Driving 自动驾驶 | v2 simulator-conditioned scene generation autonomous driving data generation |
Input: Simulation conditions based on real-world data 真实数据的模拟条件 Step1: Scene simulation for data generation 场景模拟以生成数据 Step2: Label alignment with real-world conditions 标签与真实世界条件的对齐 Step3: Benchmark evaluation for generated data 生成数据的基准评估 Output: Large-scale diverse datasets for autonomous driving applications 大规模多样化数据集,用于自动驾驶应用 |
| 9.2 | [9.2] 2503.13982 A-SCoRe: Attention-based Scene Coordinate Regression for wide-ranging scenarios [{'name': 'Huy-Hoang Bui, Bach-Thuan Bui, Quang-Vinh Tran, Yasuyuki Fujii, Joo-Ho Lee'}] |
Visual Localization 视觉定位 | v2 scene coordinate regression visual localization robotics |
Input: Images from multiple modalities 多种模式的图像 Step1: Descriptor extraction 描述符提取 Step2: Attention-based scene coordinate regression 基于注意力的场景坐标回归 Step3: Camera pose estimation 相机姿态估计 Output: Estimated camera poses 估计的相机姿态 |
| 9.2 | [9.2] 2503.14493 State Space Model Meets Transformer: A New Paradigm for 3D Object Detection [{'name': 'Chuxin Wang, Wenfei Yang, Xiang Liu, Tianzhu Zhang'}] |
3D Object Detection 3D目标检测 | v2 3D object detection state space model transformer |
Input: 3D point clouds 3D点云 Step1: Model state-dependent parameters 模型状态依赖参数 Step2: Implement interaction mechanisms 实现互动机制 Step3: Conduct experiments on datasets 在数据集上进行实验 Output: Enhanced object detection performance 改进的目标检测性能 |
| 9.2 | [9.2] 2503.14498 Tracking Meets Large Multimodal Models for Driving Scenario Understanding [{'name': 'Ayesha Ishaq, Jean Lahoud, Fahad Shahbaz Khan, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer'}] |
Autonomous Driving 自动驾驶 | v2 Large Multimodal Models Autonomous Driving 3D Spatial Understanding |
Input: Tracking information and visual data 跟踪信息和视觉数据 Step1: Integrate tracking data into Large Multimodal Models (LMMs) 将跟踪数据集成到大型多模态模型中 Step2: Self-supervised pretraining of the tracking encoder 跟踪编码器的自监督预训练 Step3: Enhance perception, planning, and prediction tasks 增强感知、规划和预测任务 Output: Improved decision-making in dynamic driving environments 输出:在动态驾驶环境中改善决策 |
| 8.5 | [8.5] 2503.13778 Using 3D reconstruction from image motion to predict total leaf area in dwarf tomato plants [{'name': 'Dmitrii Usenko, David Helman, Chen Giladi'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction leaf area estimation machine learning precision agriculture |
Input: Sequential 3D reconstructions from RGB images 从RGB图像的序列3D重建 Step1: Data integration 数据集成 Step2: 3D reconstruction algorithms development 3D重建算法开发 Step3: Leaf area estimation leaf area estimation 叶面积估计 Output: Estimated total leaf area (TLA) 估计的总叶面积(TLA) |
| 8.5 | [8.5] 2503.13792 Identifying and Mitigating Position Bias of Multi-image Vision-Language Models [{'name': 'Xinyu Tian, Shu Zou, Zhaoyuan Yang, Jing Zhang'}] |
VLM & VLA 视觉语言模型与对齐 | v2 Vision-Language Models (VLMs) 视觉语言模型 Position Bias 位置偏差 Multi-Image Reasoning 多图像推理 |
Input: Multi-image inputs 多图像输入 Step1: Introduce Position-wise Question Answering (PQA) 引入位置敏感问答任务 Step2: Analyze position bias 分析位置偏差 Step3: Propose SoFt Attention (SoFA) 提出SoFt Attention方法 Output: Mitigated position bias 减轻位置偏差 |
| 8.5 | [8.5] 2503.13858 MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations [{'name': 'Hongyu Ke, Jack Morris, Kentaro Oguchi, Xiaofei Cao, Yongkang Liu, Haoxin Wang, Yi Ding'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D visual perception autonomous driving bird's-eye view |
Input: Multi-camera images 多摄像头图像 Step1: Spatial Cross Mamba integration 空间交叉Mamba集成 Step2: Unified BEV representation generation 统一的BEV表示生成 Step3: Computational efficiency assessment 计算效率评估 Output: Enhanced BEV representation 改进的BEV表示 |
| 8.5 | [8.5] 2503.13891 Where do Large Vision-Language Models Look at when Answering Questions? [{'name': 'Xiaoying Xing, Chia-Wen Kuo, Li Fuxin, Yulei Niu, Fan Chen, Ming Li, Ying Wu, Longyin Wen, Sijie Zhu'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 Vision-Language Models 视觉语言模型 visual attention 视觉关注 multimodal tasks 多模态任务 |
Input: Large Vision-Language Models (LVLMs) 大型视觉语言模型 Step1: Extend heatmap visualization methods 扩展热图可视化方法 Step2: Select visually relevant tokens 选择视觉相关标记 Step3: Conduct analysis on LVLMs 进行LVLM分析 Output: Insights into visual understanding and attention regions 输出:视觉理解和注意区域的洞察 |
| 8.5 | [8.5] 2503.13926 Learning Shape-Independent Transformation via Spherical Representations for Category-Level Object Pose Estimation [{'name': 'Huan Ren, Wenfei Yang, Xiang Liu, Shifeng Zhang, Tianzhu Zhang'}] |
3D Pose Estimation 3D姿态估计 | v2 object pose estimation spherical representations 3D reconstruction |
Input: Observed object points 观察目标点 Step1: Feature extraction 特征提取 Step2: Spherical projection to HEALPix grids 将点投影到HEALPix网格 Step3: Correspondence prediction 对应关系预测 Output: Predict object pose and size 预测目标的姿态和尺寸 |
| 8.5 | [8.5] 2503.13938 ChatBEV: A Visual Language Model that Understands BEV Maps [{'name': 'Qingyao Xu, Siheng Chen, Guang Chen, Yanfeng Wang, Ya Zhang'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 BEV maps traffic scene understanding Vision-Language Models autonomous driving |
Input: BEV maps (Bird's-Eye View maps) BEV地图 Step1: Dataset construction using novel collection pipeline 数据集构建使用新收集管道 Step2: Fine-tune vision-language model ChatBEV on the dataset 在数据集上微调视觉语言模型ChatBEV Step3: Implement language-driven traffic scene generation pipeline 实施语言驱动的交通场景生成管道 Output: Enhanced understanding and generation of traffic scenarios 改进的交通场景理解与生成 |
| 8.5 | [8.5] 2503.13946 Is Discretization Fusion All You Need for Collaborative Perception? [{'name': 'Kang Yang, Tianci Bu, Lantao Li, Chunxu Li, Yongcai Wang, Deying Li'}] |
Autonomous Systems and Robotics 自动驾驶与机器人 | v2 Collaborative perception 协作感知 3D object detection 三维物体检测 |
Input: Features from multi-view images 由多视角图像提取的特征 Step1: Generate anchor proposals 生成锚点提案 Step2: Select confident features 选择自信特征 Step3: Perform local-global fusion 执行局部-全局融合 Output: Enhanced object detection improvements 改进的物体检测结果 |
| 8.5 | [8.5] 2503.13951 FrustumFusionNets: A Three-Dimensional Object Detection Network Based on Tractor Road Scene [{'name': 'Lili Yang, Mengshuai Chang, Xiao Guo, Yuxin Feng, Yiwen Mei, Caicong Wu'}] |
3D Object Detection 三维对象检测 | v2 3D object detection 三维对象检测 frustum-based methods 棱锥法 agricultural machinery 农业机械 |
Input: Multi-source sensor data (LiDAR and camera) 输入: 多源传感器数据(激光雷达和相机) Step1: Generate 2D object detection results to narrow search areas in 3D point cloud 第一步: 生成二维对象检测结果以缩小三维点云的搜索区域 Step2: Apply Gaussian mask to enhance point cloud information 第二步: 应用高斯掩模以增强点云信息 Step3: Extract features from both frustum point cloud and crop images 第三步: 从棱锥点云和作物图像中提取特征 Output: Concatenated features for 3D object detection 输出: 用于三维对象检测的连接特征 |
| 8.5 | [8.5] 2503.14001 Multimodal Feature-Driven Deep Learning for the Prediction of Duck Body Dimensions and Weight [{'name': 'Yi Xiao, Qiannan Han, Guiping Liang, Hongyan Zhang, Song Wang, Zhihao Xu, Weican Wan, Chuang Li, Guitao Jiang, Wenbo Xiao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point clouds multimodal data deep learning weight estimation body dimension prediction |
Input: 2D RGB images, depth images, 3D point clouds from multiple views 2D RGB图像、深度图像和来自多个视角的3D点云 Step1: Data collection and preprocessing 数据收集与预处理 Step2: Feature extraction using PointNet++ 特征提取使用PointNet++ Step3: Fusion of 2D and 3D features 2D和3D特征融合 Step4: Model training and evaluation 模型训练与评估 Output: Predicted body dimensions and weight 预测的体型尺寸和体重 |
| 8.5 | [8.5] 2503.14097 SCJD: Sparse Correlation and Joint Distillation for Efficient 3D Human Pose Estimation [{'name': 'Weihong Chen, Xuemiao Xu, Haoxin Yang, Yi Xie, Peng Xiao, Cheng Xu, Huaidong Zhang, Pheng-Ann Heng'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human pose estimation knowledge distillation |
Input: Multi-frame input sequences 多帧输入序列 Step1: Sparse correlation input sequence downsampling 稀疏相关输入序列下采样 Step2: Dynamic joint spatial attention distillation 动态关节空间注意力蒸馏 Step3: Temporal consistency distillation 时间一致性蒸馏 Output: Accurate 3D human pose predictions 精确的三维人体姿态预测 |
| 8.5 | [8.5] 2503.14154 RBFIM: Perceptual Quality Assessment for Compressed Point Clouds Using Radial Basis Function Interpolation [{'name': 'Zhang Chen, Shuai Wan, Siyu Ren, Fuzheng Yang, Mengting Yu, Junhui Hou'}] |
Point Cloud Processing 点云处理 | v2 point cloud quality assessment perceptual quality compression |
Input: Distorted point clouds 失真点云 Step1: Convert discrete point features to continuous feature function 将离散点特征转换为连续特征函数 Step2: Establish bijective feature sets 建立双射特征集 Step3: Evaluate perceptual quality 评估感知质量 Output: Enhanced quality assessment 改进的质量评估 |
| 8.5 | [8.5] 2503.14171 Lightweight Gradient-Aware Upscaling of 3D Gaussian Splatting Images [{'name': 'Simon Niedermayr, Christoph Neuhauser R\"udiger Westermann'}] |
Neural Rendering 神经渲染 | v2 3D Gaussian Splatting image upscaling novel view synthesis |
Input: Low-resolution 3D Gaussian Splatting renderings 低分辨率3D高斯点云渲染 Step1: Image gradient analysis 图像梯度分析 Step2: Gradient-based bicubic spline interpolation 基于梯度的双三次样条插值 Step3: Integration into 3DGS optimization 将其集成到3DGS优化中 Output: High-resolution images with enhanced quality 高分辨率图像和增强质量 |
| 8.5 | [8.5] 2503.14244 Deep Unsupervised Segmentation of Log Point Clouds [{'name': 'Fedor Zolotarev, Tuomas Eerola, Tomi Kauppi'}] |
Point Cloud Processing 点云处理 | v2 point cloud segmentation timber logs 3D reconstruction |
Input: Surface point clouds 表面点云 Step1: Unsupervised segmentation 无监督分割 Step2: Geometrical property analysis 几何属性分析 Step3: Model evaluation 模型评估 Output: Accurate log surface points 准确的日志表面点 |
| 8.5 | [8.5] 2503.14359 ImViD: Immersive Volumetric Videos for Enhanced VR Engagement [{'name': 'Zhengxian Yang, Shi Pan, Shengqi Wang, Haoxiang Wang, Li Lin, Guanjun Li, Zhengqi Wen, Borong Lin, Jianhua Tao, Tao Yu'}] |
3D Reconstruction and Modeling 三维重建 | v2 immersive volumetric videos 3D reconstruction multi-view capture |
Input: Multi-view, multi-modal audio-video data 多视角, 多模态音视频数据 Step1: Data capture 进行数据捕获 Step2: Benchmarking existing methods 对现有方法进行基准测试 Step3: Developing a pipeline for reconstruction 开发重建管道 Output: Immersive volumetric videos 生成沉浸式体积视频 |
| 8.5 | [8.5] 2503.14405 DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers [{'name': 'Mert Bulent Sariyildiz, Philippe Weinzaepfel, Thomas Lucas, Pau de Jorge, Diane Larlus, Yannis Kalantidis'}] |
3D Understanding 3D理解 | v2 heterogeneous teacher distillation depth estimation 3D understanding |
Input: Various heterogeneous teacher models 诸多异构教师模型 Step1: Define heterogeneous teacher distillation 定义异构教师蒸馏 Step2: Explore data-sharing strategies 探索数据共享策略 Step3: Design and evaluate the projector architecture 设计并评估投影器架构 Output: Universal encoder capable of 2D and 3D tasks 能够进行2D和3D任务的通用编码器 |
| 8.5 | [8.5] 2503.14489 Stable Virtual Camera: Generative View Synthesis with Diffusion Models [{'name': 'Jensen (Jinghao), Zhou, Hang Gao, Vikram Voleti, Aaryaman Vasishta, Chun-Han Yao, Mark Boss, Philip Torr, Christian Rupprecht, Varun Jampani'}] |
Multi-view and Stereo Vision 多视角和立体视觉 | v2 Novel View Synthesis 新视图合成 Diffusion Models 扩散模型 3D Reconstruction 三维重建 |
Input: Any number of input views and target cameras 任意数量的输入视图和目标相机 Step1: Model design 模型设计 Step2: Training strategy 训练策略 Step3: Sampling method 采样方法 Output: Novel views of a scene 场景的新视图 |
| 8.5 | [8.5] 2503.14492 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control [{'name': 'NVIDIA, :, Hassan Abu Alhaija, Jose Alvarez, Maciej Bala, Tiffany Cai, Tianshi Cao, Liz Cha, Joshua Chen, Mike Chen, Francesco Ferroni, Sanja Fidler, Dieter Fox, Yunhao Ge, Jinwei Gu, Ali Hassani, Michael Isaev, Pooya Jannaty, Shiyi Lan, Tobias Lasser, Huan Ling, Ming-Yu Liu, Xian Liu, Yifan Lu, Alice Luo, Qianli Ma, Hanzi Mao, Fabio Ramos, Xuanchi Ren, Tianchang Shen, Shitao Tang, Ting-Chun Wang, Jay Wu, Jiashu Xu, Stella Xu, Kevin Xie, Yuchong Ye, Xiaodong Yang, Xiaohui Zeng, Yu Zeng'}] |
Image and Video Generation 图像生成 | v2 world generation diffusion models autonomous driving robotics |
Input: Conditional multi-modal inputs (segmentation, depth, edge) 条件多模态输入(分割,深度,边缘) Step1: Adaptive weighting of conditional inputs 自适应加权条件输入 Step2: World generation using Conditional Diffusion Model 使用条件扩散模型生成世界 Output: Real-time world simulations 实时世界模拟 |
| 8.5 | [8.5] 2503.14501 Advances in 4D Generation: A Survey [{'name': 'Qiaowei Miao, Kehan Li, Jinsheng Quan, Zhiyuan Min, Shaojie Ma, Yichao Xu, Yi Yang, Yawei Luo'}] |
Image and Video Generation 图像生成与视频生成 | v2 4D generation autonomous driving dynamic modeling |
Input: 4D data representations 4D数据表示 Step1: Survey of existing technologies 现有技术的调查 Step2: Literature review 文献综述 Step3: Challenges and opportunities analysis 挑战与机遇分析 Output: Comprehensive understanding of 4D generation 4D生成的全面理解 |
| 8.0 | [8.0] 2503.13652 Web Artifact Attacks Disrupt Vision Language Models [{'name': 'Maan Qraitem, Piotr Teterwak, Kate Saenko, Bryan A. Plummer'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models artifact attacks model robustness |
Input: Vision-language models (VLMs) 视觉语言模型 Step1: Identify artifact-based attacks 识别伪影攻击 Step2: Develop automated mining pipeline 开发自动化挖掘管道 Step3: Optimize attacks and evaluate effectiveness 优化攻击并评估有效性 Output: Enhanced understanding of model vulnerabilities 改进对模型脆弱性的理解 |
| 8.0 | [8.0] 2503.13939 Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models [{'name': 'Yuxiang Lai, Jike Zhong, Ming Li, Shitian Zhao, Xiaofeng Yang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Medical Imaging Reinforcement Learning |
Input: Medical imaging data 医学图像数据 Step1: Implement reinforcement learning framework 实施强化学习框架 Step2: Optimize reasoning paths using GRPO 优化推理路径使用GRPO Step3: Evaluate model across different imaging modalities 评估模型在不同成像模式下的性能 Output: Enhanced generalization and trustworthiness 增强的泛化和可信性 |
| 7.5 | [7.5] 2503.13966 FlexVLN: Flexible Adaptation for Diverse Vision-and-Language Navigation Tasks [{'name': 'Siqi Zhang, Yanyuan Qiao, Qunbo Wang, Longteng Guo, Zhihua Wei, Jing Liu'}] |
VLM & VLA 视觉语言模型与视觉语言对齐 | v2 Vision-and-Language Navigation Large Language Models |
Input: Visual input and natural language instructions 视觉输入与自然语言指令 Step1: Generate high-level navigation plan 生成高层导航计划 Step2: Validate guidance feasibility 验证指导的可行性 Step3: Execute navigation actions 执行导航动作 Output: Target location reached 到达目标位置 |
| 7.5 | [7.5] 2503.14161 CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models [{'name': 'Yiqi Zhu, Ziyue Wang, Can Zhang, Peng Li, Yang Liu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Continuous Space Perception Spatial Reasoning |
Input: Multi-image sequences 多图像序列 Step1: Define continuous space perception 定义连续空间感知 Step2: Develop benchmark tasks 开发基准任务 Step3: Evaluate models across tasks 在任务中评估模型 Output: Performance metrics 绩效指标 |
| 7.5 | [7.5] 2503.14277 Towards synthetic generation of realistic wooden logs [{'name': 'Fedor Zolotarev, Borek Reich, Tuomas Eerola, Tomi Kauppi, Pavel Zemcik'}] |
3D Generation 三维生成 | v2 3D representation synthetic generation wooden logs |
Input: Specifications of wooden logs 木材参数 Step1: Internal knot generation 内部结的生成 Step2: Centerline generation 中心线的生成 Step3: Surface generation 表面生成 Output: Realistic 3D models of wooden logs 逼真的木材三维模型 |
| 7.5 | [7.5] 2503.14402 Diffusion-based Facial Aesthetics Enhancement with 3D Structure Guidance [{'name': 'Lisha Li, Jingwen Hou, Weide Liu, Yuming Fang, Jiebin Yan'}] |
Image Generation 图像生成 | v2 Facial Aesthetics Enhancement 3D structure guidance Diffusion model Facial beautification |
Input: 2D facial images 2D面部图像 Step1: Nearest Neighbor Face Searching (NNFS) module 寻找最近邻面孔 Step2: Facial Guidance Extraction (FGE) module 提取面部引导 Step3: Face Beautification (FB) module 面部美化 Output: Enhanced facial images 改进的面部图像 |
| 7.0 | [7.0] 2503.14075 Growing a Twig to Accelerate Large Vision-Language Models [{'name': 'Zhenwei Shao, Mingyang Wang, Zhou Yu, Wenwen Pan, Yan Yang, Tao Wei, Hongyuan Zhang, Ning Mao, Wei Chen, Jun Yu'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models 视觉语言模型 VLM acceleration VLM加速 Token pruning 标记修剪 |
Input: Base VLM architecture 基础视觉语言模型架构 Step1: Twig-guided token pruning twig引导的标记修剪 Step2: Self-speculative decoding 自我推测解码 Output: Accelerated VLM performance 加速的视觉语言模型性能 |
Arxiv 2025-03-12
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.07739 SIRE: SE(3) Intrinsic Rigidity Embeddings [{'name': 'Cameron Smith, Basile Van Hoorick, Vitor Guizilini, Yue Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction dynamic scene reconstruction self-supervised learning |
Input: Videos from casual scenes 来自休闲场景的视频 Step1: Estimate scene rigidity and geometry 估计场景刚性与几何 Step2: Use a least-squares solver to lift 2D trajectories into SE(3) tracks 使用最小二乘解算器将2D轨迹提升至SE(3)轨迹 Step3: Re-project back to 2D and compare against original trajectories 重新投影回2D并与原始轨迹比较 Output: Rigid scene structure and embeddings 刚性场景结构及嵌入 |
| 9.5 | [9.5] 2503.07743 SANDRO: a Robust Solver with a Splitting Strategy for Point Cloud Registration [{'name': 'Michael Adlerstein, Jo\~ao Carlos Virgolino Soares, Angelo Bratta, Claudio Semini'}] |
3D Reconstruction and Modeling 三维重建 | v2 point cloud registration 3D modeling |
Input: Point cloud data 点云数据 Step1: Initial outlier detection 初始异常值检测 Step2: Robust optimization with GNC 采用GNC的稳健优化 Step3: Splitting strategy implementation 拆分策略实现 Output: Accurate point cloud alignment 准确的点云对齐 |
| 9.5 | [9.5] 2503.07819 POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality [{'name': 'Joey Wilson, Marcelino Almeida, Sachit Mahajan, Martin Labrie, Maani Ghaffari, Omid Ghasemalizadeh, Min Sun, Cheng-Hao Kuo, Arnab Sen'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting active perception uncertainty quantification |
Input: Multi-view images 多视角图像 Step1: Derivation of covariance matrix 协方差矩阵的推导 Step2: Application of optimal experimental design 最优实验设计的应用 Step3: Quantification of information gain 信息增益的量化 Output: Enhanced strategies for 3D Gaussian Splatting 改进的三维高斯点云策略 |
| 9.5 | [9.5] 2503.07828 Neural Radiance and Gaze Fields for Visual Attention Modeling in 3D Environments [{'name': 'Andrei Chubarau, Yinan Wang, James J. Clark'}] |
Neural Rendering 神经渲染 | v2 Neural Radiance Fields visual attention 3D environments gaze prediction |
Input: 2D images of a 3D scene 3D场景的2D图像 Step1: NeRF training NeRF训练 Step2: Gaze prediction network training 注视预测网络训练 Step3: Gaze visualization and mapping to 3D structure 注意力可视化和3D结构映射 Output: Visual attention patterns and rendered images 可视化注意模式和渲染图像 |
| 9.5 | [9.5] 2503.07874 Topology-Preserving Loss for Accurate and Anatomically Consistent Cardiac Mesh Reconstruction [{'name': 'Chenyu Zhang, Yihao Luo, Yinzhe Wu, Choon Hwai Yap, Guang Yang'}] |
3D Reconstruction and Modeling 三维重建 | v2 cardiac mesh reconstruction topology-preserving loss |
Input: Volumetric data 体积数据 Step1: Identify topology-violating points 确定违反拓扑结构的点 Step2: Apply Topology-Preserving Mesh Loss 应用拓扑保护网格损失 Step3: Perform mesh deformation and optimization 执行网格形变与优化 Output: Accurate and anatomically consistent cardiac meshes 准确且解剖上一致的心脏网格 |
| 9.5 | [9.5] 2503.07940 BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes [{'name': 'Minkyun Seo, Hyungtae Lim, Kanghee Lee, Luca Carlone, Jaesik Park'}] |
Point Cloud Processing 点云处理 | v2 Point Cloud Registration 点云注册 Generalization 泛化能力 Zero-Shot Learning 零样本学习 |
Input: Point cloud data 点云数据 Step1: Identify limitations of existing methods 识别现有方法的局限性 Step2: Develop zero-shot registration framework 开发零样本注册框架 Step3: Implement adaptive voxel size and search radii 实现自适应体素大小和搜索半径 Output: Robust point cloud registration pipeline 稳健的点云注册管道 |
| 9.5 | [9.5] 2503.07952 NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields [{'name': 'Yanyu Zhang, Dongming Wang, Jie Xu, Mengyuan Liu, Pengxiang Zhu, Wei Ren'}] |
3D Reconstruction and Modeling 三维重建 | v2 visual-inertial odometry neural radiance fields augmented reality |
Input: Captured images and pre-trained NeRF model 采集的图像和预训练的NeRF模型 Step1: Initialize first IMU state 初始化第一个IMU状态 Step2: Define loss function based on geodesic distance 构建基于测地距离的损失函数 Step3: Integrate captured and rendered images 更新状态 Output: Updated poses and NeRF-based rendering 更新的位姿和基于NeRF的渲染 |
| 9.5 | [9.5] 2503.08005 CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction [{'name': 'Zhiyuan Wu, Xibin Song, Senbo Wang, Weizhe Liu, Jiayu Yang, Ziang Cheng, Shenzhou Chen, Taizhang Shang, Weixuan Sun, Shan Luo, Pan Ji'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction image-to-3D generation multi-view consistency 2D diffusion models |
Input: Single RGB image 单个RGB图像 Step1: Generate main views using a 2D diffusion model 使用2D扩散模型生成主要视图 Step2: Apply Dense View Interpolation (DVI) for additional view synthesis 使用密集视图插值(DVI)进行附加视图合成 Step3: Tri-plane-based mesh reconstruction to create 3D mesh 使用三平面网格重建创建3D网格 Output: High-quality 3D meshes with improved texture and geometry 输出: 具有改善纹理和几何形状的高质量3D网格 |
| 9.5 | [9.5] 2503.08092 SparseVoxFormer: Sparse Voxel-based Transformer for Multi-modal 3D Object Detection [{'name': 'Hyeongseok Son, Jia He, Seung-In Park, Ying Min, Yunhao Zhang, ByungIn Yoo'}] |
3D Object Detection 三维物体检测 | v2 3D Object Detection Sparse Voxel Features Autonomous Driving |
Input: Multi-modal data (LiDAR and camera) 多模态数据(LiDAR和相机) Step1: Feature Extraction 特征提取 Step2: Sparse Voxel Representation 稀疏体素表示 Step3: Transformer-based Detection 基于变压器的检测 Output: Detected 3D objects 检测到的三维物体 |
| 9.5 | [9.5] 2503.08093 MVGSR: Multi-View Consistency Gaussian Splatting for Robust Surface Reconstruction [{'name': 'Chenfeng Hou, Qi Xun Yeo, Mengqi Guo, Yongxin Su, Yanyan Li, Gim Hee Lee'}] |
Surface Reconstruction 表面重建 | v2 3D reconstruction Gaussian splatting surface reconstruction multi-view consistency |
Input: Multi-view images 多视角图像 Step1: Feature extraction 特征提取 Step2: Distractor masking distractor mask generation distractor 遮罩生成 Step3: Gaussian pruning 高斯剪枝 Step4: Surface reconstruction 表面重建 Output: Enhanced 3D models 改进的三维模型 |
| 9.5 | [9.5] 2503.08135 ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting [{'name': 'Junfu Guo, Yu Xin, Gaoyi Liu, Kai Xu, Ligang Liu, Ruizhen Hu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting articulated objects 3D reconstruction self-supervised learning digital twins |
Input: Multi-view imagery of articulated objects 多视角图像 Step 1: Concurrent part-level reconstruction 部件级同时重建 Step 2: Multi-step optimization of parameters 多步骤参数优化 Step 3: Model formation using 3D Gaussian representations 使用3D高斯模型形成 Output: Digital twins of articulated objects in 3D digital format 输出: 3D数字格式的物体数字双胞胎 |
| 9.5 | [9.5] 2503.08140 HOTFormerLoc: Hierarchical Octree Transformer for Versatile Lidar Place Recognition Across Ground and Aerial Views [{'name': 'Ethan Griffiths, Maryam Haghighat, Simon Denman, Clinton Fookes, Milad Ramezani'}] |
3D Place Recognition 3D位置识别 | v2 Lidar place recognition 3D reconstruction autonomous systems |
Input: Lidar point cloud data 激光雷达点云数据 Step1: Octree-based multi-scale attention mechanism 八叉树多尺度注意机制 Step2: Relay tokens for efficient communication 采用中继标记以提高通信效率 Step3: Pyramid attentional pooling for global descriptor synthesis 采用金字塔注意池化以合成全局描述符 Output: Robust global descriptors for place recognition 输出: 用于位置识别的鲁棒全局描述符 |
| 9.5 | [9.5] 2503.08142 A Framework for Reducing the Complexity of Geometric Vision Problems and its Application to Two-View Triangulation with Approximation Bounds [{'name': 'Felix Rydell, Georg B\"okman, Fredrik Kahl, Kathl\'en Kohn'}] |
Structure from Motion (SfM) 运动结构估计 | v2 3D reconstruction triangulation Structure-from-Motion |
Input: Noisy 2D projections from multiple images 多个图像的噪声2D投影 Step1: Cost function reweighting 代价函数重加权 Step2: Simplification of polynomial degree to improve efficiency 简化多项式的程度以提高效率 Step3: Derive optimal weighting strategies 推导最佳加权策略 Output: Closed-form solution for triangulation 闭式解的三角测量 |
| 9.5 | [9.5] 2503.08208 Explaining Human Preferences via Metrics for Structured 3D Reconstruction [{'name': 'Jack Langerman, Denys Rozumnyi, Yuzhong Huang, Dmytro Mishkin'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction metrics human preferences |
Input: Structured 3D reconstructions 结构三维重建 Step1: Evaluate automated metrics 评估自动化度量 Step2: Analyze human preferences 分析人类偏好 Step3: Propose metrics and recommendations 提出度量和建议 Output: Improved metric for 3D reconstructions 改进的三维重建度量 |
| 9.5 | [9.5] 2503.08217 S3R-GS: Streamlining the Pipeline for Large-Scale Street Scene Reconstruction [{'name': 'Guangting Zheng, Jiajun Deng, Xiaomeng Chu, Yu Yuan, Houqiang Li, Yanyong Zhang'}] |
3D Reconstruction and Modeling 3D重建与建模 | v2 3D reconstruction street scene |
Input: Multi-view images 多视角图像 Step1: Data integration 数据集成 Step2: Algorithm development 算法开发 Step3: Model evaluation 模型评估 Output: Streamlined reconstruction pipeline 精简的重建管线 |
| 9.5 | [9.5] 2503.08218 MVD-HuGaS: Human Gaussians from a Single Image via 3D Human Multi-view Diffusion Prior [{'name': 'Kaiqiang Xiong, Ying Feng, Qi Zhang, Jianbo Jiao, Yang Zhao, Zhihao Liang, Huachen Gao, Ronggang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D human reconstruction multi-view diffusion model |
Input: Single image 单张图像 Step1: Generate multi-view images from a single reference image 从单个参考图像生成多视角图像 Step2: Introduce an alignment module for camera poses 引入相机位姿对齐模块 Step3: Optimize 3D Gaussians and refine facial regions 优化3D高斯,并细化面部区域 Output: High-fidelity free-view 3D human rendering 输出:高保真自由视图3D人类渲染 |
| 9.5 | [9.5] 2503.08219 CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning [{'name': 'Kaiqiang Xiong, Rui Peng, Zhe Zhang, Tianxing Feng, Jianbo Jiao, Feng Gao, Ronggang Wang'}] |
Multi-view Stereo 多视角立体 | v2 3D reconstruction Multi-view Stereo contrastive learning autonomous driving |
Input: Multi-view images 多视角图像 Step1: Integrate dual-level contrastive learning 双层对比学习集成 Step2: Implement image-level contrastive loss 实现图像级对比损失 Step3: Implement scene-level contrastive loss 实现场景级对比损失 Step4: L0.5 photometric consistency loss implementation L0.5光度一致性损失实现 Output: Enhanced depth estimation 改进的深度估计 |
| 9.5 | [9.5] 2503.08224 HRAvatar: High-Quality and Relightable Gaussian Head Avatar [{'name': 'Dongbin Zhang, Yunfei Liu, Lijian Lin, Ye Zhu, Kangjie Chen, Minghan Qin, Yu Li, Haoqian Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting head avatars |
Input: Monocular video input 单胞视频输入 Step1: Optimize facial tracking through end-to-end training 优化面部追踪,采用端到端训练 Step2: Utilize learnable blendshapes for deformation 使用可学习的混合形状进行变形 Step3: Model head appearance using physical properties and shading techniques 使用物理属性和阴影技术建模头部外观 Output: High-fidelity, relightable 3D head avatars 输出:高保真、可照明的三维头部头像 |
| 9.5 | [9.5] 2503.08336 Talk2PC: Enhancing 3D Visual Grounding through LiDAR and Radar Point Clouds Fusion for Autonomous Driving [{'name': 'Runwei Guan, Jianan Liu, Ningwei Ouyang, Daizong Liu, Xiaolou Sun, Lianqing Zheng, Ming Xu, Yutao Yue, Hui Xiong'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D visual grounding LiDAR radar autonomous driving |
Input: Dual-sensor inputs (LiDAR and radar) 双传感器输入 (激光雷达和雷达) Step 1: Feature extraction from LiDAR and radar sensor 数据提取: 从激光雷达和雷达传感器提取特征 Step 2: Dual-sensor feature fusion using Bidirectional Agent Cross Attention (BACA) 双传感器特征融合: 使用双向代理交叉注意力 (BACA) Step 3: Region localization using Dynamic Gated Graph Fusion (DGGF) 区域定位: 使用动态门控图融合 (DGGF) Output: 3D visual grounding prediction 3D视觉定位预测 |
| 9.5 | [9.5] 2503.08352 Mitigating Ambiguities in 3D Classification with Gaussian Splatting [{'name': 'Ruiqi Zhang, Hao Zhu, Jingyi Zhao, Qi Zhang, Xun Cao, Zhan Ma'}] |
3D Classification 3D 分类 | v2 3D classification Gaussian Splatting point clouds ambiguity |
Input: GS point cloud as input 输入: GS 点云 Step1: Analyze ambiguities in traditional point cloud 分析传统点云中的歧义 Step2: Implement Gaussian Splatting classification 实施高斯点云分类 Step3: Evaluate performance using a new dataset 通过新数据集评估性能 Output: Enhanced classification of 3D objects 输出: 改进的 3D 对象分类 |
| 9.5 | [9.5] 2503.08363 Parametric Point Cloud Completion for Polygonal Surface Reconstruction [{'name': 'Zhaiyu Chen, Yuqing Wang, Liangliang Nan, Xiao Xiang Zhu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction point cloud completion polygonal surfaces |
Input: Incomplete point cloud 数据集 Step1: Infer parametric primitives 推断参数化原始体 Step2: Recover high-level geometric structures 恢复高层次几何结构 Step3: Construct polygonal surfaces from primitives 根据原始体构建多边形表面 Output: High-quality polygonal surface reconstruction 高质量多边形表面重建 |
| 9.5 | [9.5] 2503.08382 Twinner: Shining Light on Digital Twins in a Few Snaps [{'name': 'Jesus Zarzar, Tom Monnier, Roman Shapovalov, Andrea Vedaldi, David Novotny'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction PBR digital twins autonomous systems |
Input: Posed images 设定图像 Step1: Voxel-grid transformer transformation 体素网格转换 Step2: Photometric error minimization 照明误差最小化 Step3: Model evaluation and comparison 模型评估与比较 Output: 3D geometry and materials 三维几何与材料 |
| 9.5 | [9.5] 2503.08407 WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images [{'name': 'Yansong Guo, Jie Hu, Yansong Qu, Liujuan Cao'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D segmentation 2D images real-time systems |
Input: 2D images of arbitrary 3D objects 多视角图像 Step1: Pre-processing: 2D mask feature construction 预处理:2D掩模特征构建 Step2: Dynamic Global Aligning (DGA) for accuracy improvement 动态全局对齐(DGA)来提升精度 Step3: Multi-view Group Mapping (MGM) for real-time segmentation 多视角组映射(MGM)实现实时分割 Output: Aligned 3D segmentation results 对齐的3D分割结果 |
| 9.5 | [9.5] 2503.08422 JiSAM: Alleviate Labeling Burden and Corner Case Problems in Autonomous Driving via Minimal Real-World Data [{'name': 'Runjian Chen, Wenqi Shao, Bo Zhang, Shaoshuai Shi, Li Jiang, Ping Luo'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D object detection LiDAR simulation-to-real autonomous driving |
Input: LiDAR point clouds from real and simulated environments Step1: Jittering augmentation to enhance sample efficiency Step2: Utilize a domain-aware backbone for better feature extraction Step3: Implement memory-based sectorized alignment loss to bridge the simulation-to-real gap Output: Effective 3D object detection with minimal real labels |
| 9.5 | [9.5] 2503.08511 PCGS: Progressive Compression of 3D Gaussian Splatting [{'name': 'Yihang Chen, Mengyao Li, Qianyi Wu, Weiyao Lin, Mehrtash Harandi, Jianfei Cai'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D Gaussian Splatting progressive compression novel view synthesis |
Input: 3D Gaussian Splatting data 3D高斯点云数据 Step1: Progressive masking strategy progressive masking strategy Step2: Progressive quantization approach progressive quantization method Step3: Entropy coding enhancement entropy coding优化 Output: Compressed bitstream with fidelity 改进的压缩比特流 |
| 9.5 | [9.5] 2503.08516 High-Quality 3D Head Reconstruction from Any Single Portrait Image [{'name': 'Jianfu Zhang, yujie Gao, Jiahui Zhan, Wentao Wang, Yiyi Zhang, Haohua Zhao, Liqing Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction portrait images facial expressions |
Input: Single portrait image 单幅肖像图像 Step1: Data collection 数据收集 Step2: Multi-view video generation 多视角视频生成 Step3: Identity and expression integration 身份和表情整合 Step4: 3D head reconstruction 3D头部重建 Output: High-quality 3D head models 高质量3D头部模型 |
| 9.5 | [9.5] 2503.08594 3D Point Cloud Generation via Autoregressive Up-sampling [{'name': 'Ziqiao Meng, Qichao Wang, Zhipeng Zhou, Irwin King, Peilin Zhao'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point cloud generation autoregressive modeling up-sampling |
Input: 3D point clouds 3D 点云 Step1: Learn multi-scale discrete representations 学习多尺度离散表示 Step2: Train autoregressive transformer 训练自回归变换器 Step3: Generate point clouds 生成点云 Output: Refined 3D point clouds 精炼的 3D 点云 |
| 9.5 | [9.5] 2503.08601 LiSu: A Dataset and Method for LiDAR Surface Normal Estimation [{'name': "Du\v{s}an Mali\'c, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger"}] |
3D Reconstruction 三维重建 | v2 LiDAR surface normal estimation autonomous driving 3D reconstruction |
Input: LiDAR point clouds LiDAR点云 Step1: Generate synthetic dataset 生成合成数据集 Step2: Develop surface normal estimation method 开发表面法线估计方法 Step3: Evaluate model performance 评估模型性能 Output: Accurate surface normals for 3D reconstruction 改进的三维重建表面法线 |
| 9.5 | [9.5] 2503.08639 GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection [{'name': "Du\v{s}an Mali\'c, Christian Fruhwirth-Reisinger, Samuel Schulter, Horst Possegger"}] |
3D Object Detection 3D物体检测 | v2 3D object detection domain generalization LiDAR Gaussian blobs local geometry |
Input: LiDAR point cloud data 激光雷达点云数据 Step1: Encode local point cloud neighborhoods using Gaussian blobs 使用高斯点云对局部点云邻域进行编码 Step2: Integrate the Gaussian blobs into existing detection frameworks 将高斯点云集成到现有检测框架中 Step3: Evaluate model performance on cross-domain benchmarks 在跨域基准测试中评估模型性能 Output: Enhanced detection accuracy in domain generalization 在领域泛化中提高检测精度 |
| 9.5 | [9.5] 2503.08664 MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention [{'name': 'Yuhan Wang, Fangzhou Hong, Shuai Yang, Liming Jiang, Wayne Wu, Chen Change Loy'}] |
3D Generation 三维生成 | v2 3D generation multiview diffusion human modeling |
Input: Frontal image of a human figure 人物的正面图像 Step1: Establish correspondences using rasterization 和投影建立对应关系 Step2: Introduce mesh attention to handle high resolution 引入网格注意力以处理高分辨率 Step3: Generate multiview images using the trained model 使用训练好的模型生成多视角图像 Output: Dense, view-consistent human images at megapixel resolution 输出:百万像素分辨率下的稠密一致人像图像 |
| 9.5 | [9.5] 2503.08676 Language-Depth Navigated Thermal and Visible Image Fusion [{'name': 'Jinchang Zhang, Zijun Li, Guoyu Lu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction image fusion depth estimation autonomous driving |
Input: Infrared and visible images, along with depth information 输入: 红外和可见图像,及深度信息 Step1: Multi-channel feature extraction using a diffusion model 步骤1: 使用扩散模型进行多通道特征提取 Step2: Language-guided fusion with depth information 步骤2: 结合深度信息的语言指导融合 Step3: Depth estimation and optimization of the fusion network 步骤3: 深度估计并优化融合网络 Output: Enhanced color-fused images 输出: 改进的彩色融合图像 |
| 9.2 | [9.2] 2503.07946 7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting [{'name': 'Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri, Terrence Chen, Ziyan Wu'}] |
Neural Rendering 神经渲染 | v2 real-time rendering Gaussian Splatting dynamic scenes |
Input: Scene elements represented as 7D Gaussians 场景元素以7D高斯表示 Step1: Conditional slicing mechanism 逐步:条件切片机制 Step2: Joint optimization integration 联合优化集成 Step3: Rendering of dynamic scenes 渲染动态场景 Output: Real-time rendering with view-dependent effects 输出:支持视图依赖的实时渲染 |
| 9.2 | [9.2] 2503.08101 Accelerate 3D Object Detection Models via Zero-Shot Attention Key Pruning [{'name': 'Lizhen Xu, Xiuxiu Bai, Xiaojun Jia, Jianwu Fang, Shanmin Pang'}] |
3D Object Detection 3D目标检测 | v2 3D object detection 3D目标检测 zero-shot pruning零样本剪枝 transformer decoders 变换器解码器 |
Input: 3D object detection models 3D目标检测模型 Step1: Classification score extraction 分类评分提取 Step2: Importance score computation 重要性评分计算 Step3: Key pruning based on importance 依据重要性进行关键字剪枝 Output: Accelerated inference speed 加速的推理速度 |
| 9.0 | [9.0] 2503.08373 nnInteractive: Redefining 3D Promptable Segmentation [{'name': 'Fabian Isensee, Maximilian Rokuss, Lars Kr\"amer, Stefan Dinkelacker, Ashis Ravindran, Florian Stritzke, Benjamin Hamm, Tassilo Wald, Moritz Langenberg, Constantin Ulrich, Jonathan Deissler, Ralf Floca, Klaus Maier-Hein'}] |
3D Segmentation 三维分割 | v2 3D segmentation interactive segmentation volumetric data |
Input: User prompts (points, scribbles, bounding boxes, lasso) 用户提示(点、涂鸦、边界框、套索) Step1: Data integration from volumetric datasets 从体积数据集中进行数据集成 Step2: 3D interactive segmentation algorithm development 开发 3D 交互式分割算法 Step3: Integration into imaging platforms (e.g., Napari, MITK) 集成到成像平台(例如,Napari,MITK) Output: Full 3D segmentations from 2D interactions 从 2D 交互生成完整的 3D 分割 |
| 9.0 | [9.0] 2503.08471 TrackOcc: Camera-based 4D Panoptic Occupancy Tracking [{'name': 'Zhuoguang Chen, Kenan Li, Xiuyu Yang, Tao Jiang, Yiming Li, Hang Zhao'}] |
Autonomous Systems and Robotics 自动驾驶及机器人技术 | v2 4D occupancy tracking autonomous systems 3D tracking camera-based perception |
Input: Camera images 相机图像 Step1: Image feature extraction 图像特征提取 Step2: 4D panoptic queries integration 4D全景查询集成 Step3: Result prediction 结果预测 Output: Panoptic occupancy labels 全景占用标签 |
| 8.5 | [8.5] 2503.07813 AgriField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel [{'name': 'Elvis Kimara, Mozhgan Hadadi, Jackson Godbersen, Aditya Balu, Talukder Jubery, Yawei Li, Adarsh Krishnamurthy, Patrick S. Schnable, Baskar Ganapathysubramanian'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D point clouds agricultural research maize |
Input: 3D point clouds of maize plants 玉米植物的三维点云 Step1: Data collection 数据收集 Step2: Procedural model generation 程序模型生成 Step3: Graph-based segmentation 图基于的分割 Output: Curated dataset for agricultural research 为农业研究提供的整理数据集 |
| 8.5 | [8.5] 2503.07829 Fixing the RANSAC Stopping Criterion [{'name': 'Johannes Sch\"onberger, Viktor Larsson, Marc Pollefeys'}] |
Multi-view and Stereo Vision 多视角立体视觉 | v2 RANSAC 3D reconstruction robust estimation |
Input: Noisy measurements 噪声测量 Step1: Analyze RANSAC sampling probability 分析RANSAC采样概率 Step2: Derive exact stopping criterion 推导精确停止准则 Step3: Evaluate model performance 评估模型性能 Output: Improved model estimation 改进的模型估计 |
| 8.5 | [8.5] 2503.07909 FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction [{'name': 'Dennis Rotondi, Fabio Scaparro, Hermann Blum, Kai O. Arras'}] |
3D Scene Graphs 3D场景图 | v2 3D scene graphs functional interactive elements robot perception affordance grounding |
Input: Multi-view RGB-D images 多视角RGB-D图像 Step1: Detect functional elements 检测功能性元件 Step2: Augment 3D scene graph generation 扩展3D场景图生成 Step3: Evaluate functional segmentation 评估功能分割 Output: Enhanced 3D scene graphs 改进的3D场景图 |
| 8.5 | [8.5] 2503.07933 From Slices to Sequences: Autoregressive Tracking Transformer for Cohesive and Consistent 3D Lymph Node Detection in CT Scans [{'name': 'Qinji Yu, Yirui Wang, Ke Yan, Dandan Zheng, Dashan Ai, Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Yun Bian, Na Shen, Xiaowei Ding, Le Lu, Xianghua Ye, Dakai Jin'}] |
3D Reconstruction 三维重建 | v2 3D reconstruction autonomous driving |
Input: 3D CT scans Step1: Transform slice-based detection to a tracking task Step2: Develop a transformer decoder for tracking and detection Step3: Evaluate 3D instance association Output: Enhanced lymph node detection in 3D CT scans |
| 8.5 | [8.5] 2503.07939 STRMs: Spatial Temporal Reasoning Models for Vision-Based Localization Rivaling GPS Precision [{'name': 'Hin Wai Lui, Jeffrey L. Krichmar'}] |
Localization and Navigation 本地化与导航 | v2 vision-based localization 3D reconstruction autonomous navigation |
Input: First-person perspective observations (FPP) 第一定义观察 Step 1: Data transformation to global map perspective (GMP) 数据转化为全景视图 Step 2: Model training using VAE-RNN and VAE-Transformer 使用VAE-RNN和VAE-Transformer进行模型训练 Step 3: Performance evaluation in real-world environments 在真实环境中评估性能 Output: Precise geographical coordinates and localization capabilities 精确的地理坐标和定位能力 |
| 8.5 | [8.5] 2503.07942 STEAD: Spatio-Temporal Efficient Anomaly Detection for Time and Compute Sensitive Applications [{'name': 'Andrew Gao, Jun Liu'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 anomaly detection autonomous driving |
Input: Video data 视频数据 Step1: Anomaly detection algorithm development 异常检测算法开发 Step2: Feature extraction 特征提取 Step3: Model evaluation 模型评估 Output: Anomalies identified anomalies identified |
| 8.5 | [8.5] 2503.08016 SGNetPose+: Stepwise Goal-Driven Networks with Pose Information for Trajectory Prediction in Autonomous Driving [{'name': 'Akshat Ghiya, Ali K. AlShami, Jugal Kalita'}] |
Autonomous Driving 自动驾驶 | v2 pedestrian trajectory prediction autonomous driving pose estimation skeleton information |
Input: Video data 视频数据 Step1: Extract skeleton information using ViTPose 从ViTPose提取骨骼信息 Step2: Compute joint angles based on skeleton data 根据骨骼数据计算关节角度 Step3: Integrate pose information with bounding box data 将姿态信息与边界框数据集成 Step4: Apply temporal data augmentation for improved performance 进行时间数据增强以提高性能 Output: Predicted pedestrian trajectories 预测行人轨迹 |
| 8.5 | [8.5] 2503.08068 Simulating Automotive Radar with Lidar and Camera Inputs [{'name': 'Peili Song, Dezhen Song, Yifan Yang, Enfan Lan, Jingtai Liu'}] |
Autonomous Systems and Robotics 自动驾驶 | v2 Automotive radar Autonomous driving Data simulation Neural networks Lidar and camera integration |
Input: Camera images and lidar point clouds 摄像头图像和激光雷达点云 Step1: Estimate radar signal distribution 估计雷达信号分布 Step2: Generate 4D radar signals 生成4D雷达信号 Step3: Predict radar signal strength (RSS) 预测雷达信号强度 (RSS) Output: Simulated radar datagram 输出: 模拟雷达数据包 |
| 8.5 | [8.5] 2503.08165 Multimodal Generation of Animatable 3D Human Models with AvatarForge [{'name': 'Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang'}] |
3D Generation 三维生成 | v2 3D human modeling 3D人类建模 animatable avatars 可动画头像 LLM integration LLM集成 |
Input: Text or image inputs 文本或图像输入 Step1: Capture detailed specifications 捕捉详细规范 Step2: Integrate LLM for commonsense reasoning 集成LLM进行常识推理 Step3: Utilize 3D human generators 利用3D人类生成器 Step4: Iterative refinement through auto-verification 通过自动验证进行迭代完善 Output: Customizable, animatable 3D human avatars 输出: 可定制的可动画3D人类头像 |
| 8.5 | [8.5] 2503.08377 Layton: Latent Consistency Tokenizer for 1024-pixel Image Reconstruction and Generation by 256 Tokens [{'name': 'Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang'}] |
Image Generation 图像生成 | v2 image reconstruction latent diffusion models tokenization |
Input: High-resolution images 高分辨率图像 Step1: Image tokenization 图像令牌化 Step2: Latent consistency decoding 潜在一致性解码 Step3: Token compression 令牌压缩 Output: Efficient 1024x1024 image representation 高效的1024x1024图像表示 |
| 8.5 | [8.5] 2503.08421 Learning to Detect Objects from Multi-Agent LiDAR Scans without Manual Labels [{'name': 'Qiming Xia, Wenkai Lin, Haoen Xiang, Xun Huang, Siheng Chen, Zhen Dong, Cheng Wang, Chenglu Wen'}] |
3D Object Detection 3D物体检测 | v2 3D object detection LiDAR scans unsupervised learning |
Input: Multi-agent LiDAR scans 多代理LiDAR扫描 Step1: Initialization with shared ego-pose and ego-shape 使用共享的自我姿态和自我形状初始化 Step2: Preliminary label generation 生成初步标签 Step3: Multi-scale encoding for label refinement 对标签进行多尺度编码以进行精炼 Step4: Contrastive learning with refined labels 使用精炼标签进行对比学习 Output: High-quality detection results 高质量检测结果 |
| 8.5 | [8.5] 2503.08483 GAS-NeRF: Geometry-Aware Stylization of Dynamic Radiance Fields [{'name': 'Nhat Phuong Anh Vu, Abhishek Saroha, Or Litany, Daniel Cremers'}] |
Neural Rendering 神经渲染 | v2 3D stylization dynamic radiance fields |
Input: Dynamic scenes 动态场景 Step1: Extract depth maps 提取深度图 Step2: Geometry and appearance stylization 几何和外观风格化 Step3: Temporal coherence maintenance 时间一致性维护 Output: Stylized dynamic radiance fields 风格化动态辐射场 |
| 8.5 | [8.5] 2503.08485 TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting [{'name': 'Fengyi Zhang, Huitong Yang, Zheng Zhang, Zi Huang, Yadan Luo'}] |
3D Reconstruction and Modeling 三维重建 | v2 occupancy prediction 3D Gaussians autonomous driving |
Input: Raw sensor streams 原始传感器流 Step1: Lift surrounding-view semantics to instantiate Gaussians 提升周围视图语义以实例化高斯 Step2: Move dynamic Gaussians along estimated scene flow 移动动态高斯以沿估计场景流进行 Step3: Smooth neighboring Gaussians during optimization 平滑优化过程中相邻高斯 Output: Voxelized occupancy prediction 体素化占用预测 |
| 8.5 | [8.5] 2503.08512 SAS: Segment Any 3D Scene with Integrated 2D Priors [{'name': 'Zhuoyuan Li, Jiahao Lu, Jiacheng Deng, Hanzhi Chang, Lifan Wu, Yanzhe Liang, Tianzhu Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene understanding open vocabulary point cloud 2D to 3D correspondence |
Input: Point cloud features and 2D model capabilities 2D模型能力和点云特征 Step1: Model Alignment via Text 模型对齐 Step2: Annotation-Free Model Capability Construction 免标注模型能力构建 Step3: Feature distillation to 3D domain 特征蒸馏到3D域 Output: Integrated 3D scene representations 集成的3D场景表示 |
| 8.5 | [8.5] 2503.08596 X-Field: A Physically Grounded Representation for 3D X-ray Reconstruction [{'name': 'Feiran Wang, Jiachen Tao, Junyi Wu, Haoxuan Wang, Bin Duan, Kai Wang, Zongxin Yang, Yan Yan'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction X-ray imaging medical diagnostics |
Input: X-ray projections X射线投影 Step1: Material modeling 材料建模 Step2: Path partitioning algorithm 路径分区算法 Step3: Energy absorption estimation 能量吸收估算 Output: 3D representations of internal structures 内部结构的三维表示 |
| 8.5 | [8.5] 2503.08673 Keypoint Detection and Description for Raw Bayer Images [{'name': 'Jiakai Lin, Jinchang Zhang, Guoyu Lu'}] |
Robotic Perception 机器人感知 | v2 keypoint detection SLAM raw images |
Input: Raw Bayer images 原始拜尔图像 Step1: Develop convolutional kernels 开发卷积核 Step2: Direct keypoint detection 直接关键点检测 Step3: Feature description 特征描述 Output: Accurate keypoints and descriptors 准确的关键点和描述符 |
| 8.5 | [8.5] 2503.08683 CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving [{'name': 'Changxing Liu, Genjia Liu, Zijun Wang, Jinchang Yang, Siheng Chen'}] |
Autonomous Systems and Robotics 自主系统与机器人 | v2 cooperative autonomous driving vehicle-to-vehicle communication LLM-based negotiation real-time control |
Input: Vehicle-to-vehicle data 车辆间数据 Step1: LLM-based negotiation module 建立互动的LLM协商模块 Step2: Intention-guided waypoint generation 道路意图引导的路径生成 Step3: Real-time driving control 实时驾驶控制 Output: Improved cooperative driving performance 改进的合作驾驶性能 |
| 7.5 | [7.5] 2503.08368 Debiased Prompt Tuning in Vision-Language Model without Annotations [{'name': 'Chaoquan Jiang, Yunfan Yang, Rui Hu, Jitao Sang'}] |
Vision-Language Models (VLMs) 视觉语言模型 | v2 Vision-Language Models Robustness Debiased Prompt Tuning |
Input: Vision-Language Models (VLMs) 视觉语言模型 Step1: Analyze spurious correlations 分析虚假相关性 Step2: Utilize zero-shot recognition capabilities 利用零样本识别能力 Step3: Propose a debiased prompt tuning method 提出去偏置的提示调整方法 Output: Improved group robustness 提高的群体稳健性 |
Arxiv 2025-03-11
| Relavance | Title | Research Topic | Keywords | Pipeline |
|---|---|---|---|---|
| 9.5 | [9.5] 2503.06117 NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features [{'name': 'Hongjia Zhai, Boming Zhao, Hai Li, Xiaokun Pan, Yijia He, Zhaopeng Cui, Hujun Bao, Guofeng Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 visual localization neural implicit maps 3D modeling |
Input: 2D images with 3D context 提供2D图像与3D上下文 Step1: Extract 2D feature maps 提取2D特征图 Step2: Learn a 3D keypoint descriptor field 学习3D关键点描述符场 Step3: Align feature distributions 对齐特征分布 Step4: Establish matching graph 建立匹配图 Output: 6-DoF pose estimation 输出6自由度位姿估计 |
| 9.5 | [9.5] 2503.06154 SRM-Hair: Single Image Head Mesh Reconstruction via 3D Morphable Hair [{'name': 'Zidu Wang, Jiankuo Zhao, Miao Xu, Xiangyu Zhu, Zhen Lei'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D reconstruction 3DMM hair modeling |
Input: Single image 单张图像 Step1: Data collection 数据收集 Step2: Semantic-consistent ray modeling 语义一致的光线建模 Step3: Hair mesh reconstruction 头发网格重建 Output: 3D hair mesh 3D头发网格 |
| 9.5 | [9.5] 2503.06219 VLScene: Vision-Language Guidance Distillation for Camera-Based 3D Semantic Scene Completion [{'name': 'Meng Wang, Huilong Pi, Ruihui Li, Yunchuan Qin, Zhuo Tang, Kenli Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D semantic scene completion autonomous driving vision-language models |
Input: Camera-based images 相机采集图像 Step1: Vision-language guidance distillation 视觉语言指导蒸馏 Step2: Geometric-semantic awareness mechanism 几何-语义感知机制 Step3: Model evaluation 模型评估 Output: Enhanced 3D semantic representations 改进的三维语义表示 |
| 9.5 | [9.5] 2503.06222 Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations [{'name': 'Meng Wang, Fan Wu, Yunchuan Qin, Ruihui Li, Zhuo Tang, Kenli Li'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D scene completion autonomous driving semantic scene completion |
Input: 2D images Step1: Extract 2D explicit semantics and align into 3D space Step2: Decouple scene information into dynamic and static features Step3: Design dynamic-static adaptive fusion module Output: Robust and accurate semantic scene representations |
| 9.5 | [9.5] 2503.06235 StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams [{'name': 'Yang LI, Jinglu Wang, Lei Chu, Xiao Li, Shiu-hong Kao, Ying-Cong Chen, Yan Lu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting |
Input: Unposed image streams 未标定图像流 Step1: Predict per-frame Gaussians 逐帧预测高斯 Step2: Establish pixel correspondences 建立像素对应关系 Step3: Merge redundant Gaussians 合并冗余高斯 Output: Online 3D reconstruction 在线三维重建 |
| 9.5 | [9.5] 2503.06237 Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection [{'name': 'Yifan Chang, Junjie Huang, Xiaofeng Wang, Yun Ye, Zhujin Liang, Yi Shan, Dalong Du, Xingang Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D lane detection autonomous driving geometric structures |
Input: Monocular images 单目图像 Step1: Theoretical analysis 理论分析 Step2: Patching strategy development 修补策略开发 Step3: Model enhancement 模型增强 Output: Improved lane representations 改进的车道表示 |
| 9.5 | [9.5] 2503.06462 StructGS: Adaptive Spherical Harmonics and Rendering Enhancements for Superior 3D Gaussian Splatting [{'name': 'Zexu Huang, Min Xu, Stuart Perry'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 3D Gaussian Splatting 3D reconstruction neural rendering |
Input: Data from multiple views of a scene 多视角场景数据 Step1: Utilization of 3D Gaussian Splatting 采用3D高斯涂抹 Step2: Dynamic adjustment of spherical harmonics 动态调整球谐 Step3: Incorporation of Multi-scale Residual Network (MSRN) 引入多尺度残差网络 Step4: Rendering of high-quality images from low-resolution inputs 从低分辨率输入生成高质量图像 Output: Enhanced novel views of 3D models 改进的3D模型新视图 |
| 9.5 | [9.5] 2503.06485 A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation [{'name': 'Jiajie Fan, Amal Trigui, Andrea Bonfanti, Felix Dietrich, Thomas B\"ack, Hao Wang'}] |
3D Generation 三维生成 | v2 3D generation spectral-domain diffusion mesh processing |
Input: High-dimensional shapes 高维形状 Step1: Shape encoding using SVD 采用SVD进行形状编码 Step2: Generative modeling on eigenfeatures 在特征向量上进行生成建模 Step3: Mesh generation based on spectral features 基于谱特征生成网格 Output: High-quality 3D shapes 生成高质量的三维形状 |
| 9.5 | [9.5] 2503.06565 Future-Aware Interaction Network For Motion Forecasting [{'name': 'Shijie Li, Xun Xu, Si Yong Yeo, Xulei Yang'}] |
Autonomous Driving 自动驾驶 | v2 motion forecasting autonomous driving spatiotemporal modeling |
Input: Scene encoding with historical trajectories 输入: 包含历史轨迹的场景编码 Step 1: Integrate future trajectories into encoding 步骤1: 将未来轨迹整合到编码中 Step 2: Use Mamba for spatiotemporal modeling 步骤2: 使用Mamba进行时空建模 Step 3: Refine and predict future trajectories 步骤3: 精炼并预测未来轨迹 Output: Accurate future trajectory predictions 输出: 准确的未来轨迹预测 |
| 9.5 | [9.5] 2503.06569 Global-Aware Monocular Semantic Scene Completion with State Space Models [{'name': 'Shijie Li, Zhongyao Cheng, Rong Li, Shuai Li, Juergen Gall, Xun Xu, Xulei Yang'}] |
3D Reconstruction and Modeling 三维重建与建模 | v2 Semantic Scene Completion 语义场景补全 3D Reconstruction 三维重建 Monocular Vision 单目视觉 |
Input: Single image 单幅图像 Step1: 2D feature extraction 2D特征提取 Step2: Long-range dependency modeling 长程依赖建模 Step3: 3D information completion 3D信息补全 Output: Complete 3D representation 完整的3D表现 |
| 9.5 | [9.5] 2503.06587 Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction [{'name': 'Xiaoming Peng, Yixin Yang, Yang Zhou, Hui Huang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction Gaussian Splatting surface reconstruction |
Input: 2D Gaussian Splatting data 2D高斯涂抹数据 Step1: Analyze reflection discontinuity 分析反射不连续性 Step2: Introduce depth convergence loss 引入深度收敛损失 Step3: Rectify depth criterion 修正深度标准 Output: Enhanced surface reconstruction 改进的表面重建 |
| 9.5 | [9.5] 2503.06660 AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation [{'name': 'Yang Zou, Zhaoshuai Qi, Yating Liu, Zihao Xu, Weipeng Sun, Weiyi Liu, Xingyuan Li, Jiaqi Yang, Yanning Zhang'}] |
3D Reconstruction and Modeling 三维重建 | v2 6D pose estimation 6D姿态估计 robotics 机器人 autonomous driving 自动驾驶 computer vision 计算机视觉 |
Input: Single view image 单视图图像 Step1: Axis Generation Module (AGM) construction 轴生成模块(AGM)构建 Step2: Geometric consistency loss injection 几何一致性损失注入 Step3: Triaxial Back-projection Module (TBM) application 三轴反投影模块(TBM)应用 Output: Estimated 6D object pose 估计的6D物体姿态 |
| 9.5 | [9.5] 2503.06677 REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints [{'name': 'Di Wu, Liu Liu, Zhou Linli, Anran Huang, Liangtu Song, Qiaojun Yu, Qi Wu, Cewu Lu'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction articulated objects Gaussian Splatting |
Input: Multi-view RGB images of articulated objects 多视角RGB图像 Step1: Introduce Signed Distance Field (SDF) guidance to regularize Gaussian opacity fields 引入签名距离场(SDF)引导以规范化高斯不透明度场 Step2: Establish deformable fields for 3D Gaussians constrained by kinematic structures 建立受运动结构约束的3D高斯可变形场 Step3: Achieve unsupervised generation of surface meshes in unseen states 实现对未见状态表面网格的无监督生成 Output: High-quality textured surface reconstruction and generation 输出:高质量纹理表面重建与生成 |
| 9.5 | [9.5] 2503.06744 CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving [{'name': 'Rui Song, Chenwei Liang, Yan Xia, Walter Zimmer, Hu Cao, Holger Caesar, Andreas Festag, Alois Knoll'}] |
Multi-view and Stereo Vision 多视角与立体视觉 | v2 4D Gaussian Splatting dynamic scene rendering autonomous driving |
Input: Dynamic scenes 动态场景 Step1: Use 2D segmentation for Gaussian features 使用2D分割获取高斯特征 Step2: Track temporally deformed features 跟踪时间变形特征 Step3: Aggregate context and deformation features 组合上下文和变形特征 Output: Enhanced dynamic scene representations 改进的动态场景表示 |
| 9.5 | [9.5] 2503.06762 Gaussian RBFNet: Gaussian Radial Basis Functions for Fast and Accurate Representation and Reconstruction of Neural Fields [{'name': 'Abdelaziz Bouzidi, Hamid Laga, Hazem Wannous'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction neural fields Gaussian RBF |
Input: Neural fields and images 神经场和图像 Step1: Replace MLP neurons with RBF kernels 用RBF核替换MLP神经元 Step2: Train for 3D geometry representation 训练3D几何表示 Step3: Optimize for novel view synthesis 优化新视图合成 Output: Fast and accurate neural representation 快速准确的神经表示 |
| 9.5 | [9.5] 2503.06818 Sub-Image Recapture for Multi-View 3D Reconstruction [{'name': 'Yanwei Wang'}] |
3D Reconstruction and Modeling 三维重建 | v2 3D reconstruction multi-view geometry |
Input: Original high-resolution images 原始高分辨率图像 Step1: Split images into sub-images 将图像分割成子图像 Step2: Process sub-images individually 分别处理子图像 Step3: Apply existing 3D reconstruction algorithms using sub-images 使用子图像处理现有三维重建算法 Output: Enhanced 3D reconstruction results 改进的三维重建结果 |
| 9.5 | [9.5] 2503.06821 HierDAMap: Towards Universal Domain Adaptive BEV Mapping via Hierarchical Perspective Priors [{'name': 'Siyu Li, Yihong Cao, Hao Shi, Yongsheng Zang, Xuan He, Kailun Yang, Zhiyong Li'}] |
Autonomous Driving 自动驾驶 | v2 Bird's-Eye View (BEV) mapping domain adaptation 3D mapping |
Input: Multi-view images 多视角图像 Step1: Hierarchical perspective prior-guided domain adaptation 分层视角先验引导的领域适应 Step2: Component integration 组件集成 (SGPS, DACL, CDFM) Step3: Performance evaluation 性能评估 Output: Enhanced BEV mapping results 改进的鸟瞰映射结果 |
| 9.5 | [9.5] 2503.06900 DirectTriGS: Triplane-based Gaussian Splatting Field Representation for 3D Generation < |