<rss version="2.0" xmlns:atom="https://www.w3.org/2005/Atom">
  <channel>
    <title>Transport Research International Documentation (TRID)</title>
    <link>https://trid.trb.org/</link>
    <atom:link href="https://trid.trb.org/Record/RSS?s=PHNlYXJjaD48cGFyYW1zPjxwYXJhbSBuYW1lPSJkYXRlaW4iIHZhbHVlPSJhbGwiIC8+PHBhcmFtIG5hbWU9InN1YmplY3Rsb2dpYyIgdmFsdWU9Im9yIiAvPjxwYXJhbSBuYW1lPSJ0ZXJtc2xvZ2ljIiB2YWx1ZT0ib3IiIC8+PHBhcmFtIG5hbWU9ImxvY2F0aW9uIiB2YWx1ZT0iMCIgLz48L3BhcmFtcz48ZmlsdGVycz48ZmlsdGVyIGZpZWxkPSJpbmRleHRlcm1zIiB2YWx1ZT0iJnF1b3Q7VmlzdWFsIHBlcmNlcHRpb24mcXVvdDsiIG9yaWdpbmFsX3ZhbHVlPSImcXVvdDtWaXN1YWwgcGVyY2VwdGlvbiZxdW90OyIgLz48L2ZpbHRlcnM+PHJhbmdlcyAvPjxzb3J0cz48c29ydCBmaWVsZD0icHVibGlzaGVkIiBvcmRlcj0iZGVzYyIgLz48L3NvcnRzPjxwZXJzaXN0cz48cGVyc2lzdCBuYW1lPSJyYW5nZXR5cGUiIHZhbHVlPSJwdWJsaXNoZWRkYXRlIiAvPjwvcGVyc2lzdHM+PC9zZWFyY2g+" rel="self" type="application/rss+xml" />
    <description></description>
    <language>en-us</language>
    <copyright>Copyright © 2026. National Academy of Sciences. All rights reserved.</copyright>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <managingEditor>tris-trb@nas.edu (Bill McLeod)</managingEditor>
    <webMaster>tris-trb@nas.edu (Bill McLeod)</webMaster>
    <image>
      <title>Transport Research International Documentation (TRID)</title>
      <url>https://trid.trb.org/Images/PageHeader-wTitle.jpg</url>
      <link>https://trid.trb.org/</link>
    </image>
    <item>
      <title>Quantifying Visual Attention of Teams During Workload Transitions Using AOI-Based Cross-Recurrence Metrics</title>
      <link>https://trid.trb.org/View/2680947</link>
      <description><![CDATA[Cross-recurrence quantification analysis (CRQA) metrics may offer a means to provide information about the quality of collaboration in real-time. The goal of the present work is to use Area of Interest (AOI) based CRQA metrics to analyze the eye-tracking data of 10 pairs who participated in a shared unmanned aerial vehicle (UAV) command and control task. We are interested in how teams respond to workload transitions and how it affects AOI-based CRQA metrics. The results showed that as workload increased, team members spent a longer time on the same task which may indicate that they are coordinating together on a task, or they are not adapting and getting “trapped” in certain tasks. The findings suggest that CRQA AOI-based metrics are sensitive to workload changes and validate these metrics in unraveling the visual puzzle of how workload impacts scan-path patterns which contribute to quantifying the adaptation process of pairs over time. This also has the potential to inform the design of real-time technology in the future.]]></description>
      <pubDate>Sat, 02 May 2026 15:47:30 GMT</pubDate>
      <guid>https://trid.trb.org/View/2680947</guid>
    </item>
    <item>
      <title>Location-Aware Transformer Network for Bird's Eye View Semantic Segmentation</title>
      <link>https://trid.trb.org/View/2659098</link>
      <description><![CDATA[Bird's Eye View (BEV) segmentation with multiple surrounding cameras is crucial in autonomous driving due to its intuitive top-down view of the road environment. Despite the success of previous transformer-based networks, earlier works did not fully utilize the fact that each location on the BEV map requires features at its own resolution from the perspective view (PV). For instance, high-resolution features are needed for distant locations, while low-resolution features are sufficient for nearby areas. Therefore, a suitable combination of different resolution features is necessary to accurately handle variations in appearance in the PV, such as scale, lighting, context, and occlusion across various locations. To address this, we propose a new BEV segmentation network named Location-Aware Transformer Network (LATNet). LATNet blends different resolutions of PV image features using Location-Aware Attention, leveraging the correlation between the resolution of PV features and their location on the BEV map. By doing so, LATNet can create robust features for any BEV map location. Additionally, we introduce Ego-Centric Aware Flip (ECA Flip), a novel augmentation strategy for multi-camera-based BEV segmentation. Conventional augmentation methods disrupt the geometric relationship between PV and BEV, making them unsuitable for multi-camera-based BEV segmentation tasks. In contrast, ECA Flip can augment the original data up to fourfold without compromising the geometric relationships, significantly boosting the network's performance. Our approach achieves state-of-the-art results with real-time inference speed for camera-based semantic segmentation on both the nuScenes and Argoverse datasets.]]></description>
      <pubDate>Wed, 29 Apr 2026 09:10:08 GMT</pubDate>
      <guid>https://trid.trb.org/View/2659098</guid>
    </item>
    <item>
      <title>A Unified Monocular Vision-Based Driving Model for Autonomous Vehicles With Multi-Task Capabilities</title>
      <link>https://trid.trb.org/View/2659093</link>
      <description><![CDATA[The recent progress in autonomous driving primarily relies on sensor-rich systems, encompassing radars, LiDARs, and advanced cameras, in order to perceive the environment. However, human-operated vehicles showcase an impressive ability to drive based solely on visual perception. This study introduces an end-to-end method for predicting the steering angle and vehicle speed exclusively from a monocular camera image. Alongside the color image, which conveys scene texture and appearance details, a monocular depth image and a semantic segmentation image are internally derived and incorporated, offering insights into spatial and semantic environmental structures. This results in a total of three input images. Moreover, LSTM units are also employed to acquire temporal features. The proposed model demonstrates a significant enhancement in RMSE compared to the state-of-the-art, achieving a notable improvement of 44.96% for the steering angle and 4.39% for the speed on the Udacity dataset. Furthermore, tests on the CARLA and Sully Chen datasets yield results that outperform those reported in the literature. Extensive ablation studies are also conducted to showcase the effectiveness of each component. These findings highlight the potential of self-driving systems using visual input alone.]]></description>
      <pubDate>Wed, 29 Apr 2026 09:10:08 GMT</pubDate>
      <guid>https://trid.trb.org/View/2659093</guid>
    </item>
    <item>
      <title>Visual Perception Stack for Autonomous Vehicle</title>
      <link>https://trid.trb.org/View/2581841</link>
      <description><![CDATA[Visual perception stack is indispensable for present-day autonomous vehicle. It perceives the environment around the ego vehicle in the same way as humans. Stereo cameras are the prominent sensor for visual perception stack. This paper will focus on deep-level understanding of visual perception using stereo camera, image processing, and crucial aspects required for autonomous cars. There are other sensors such as LIDAR, GNSS, and IMU, which are used for environment perception, but the information from stereo cameras are far profitable and utilitarian than other sensor information. First, the image formation phenomenon is discussed followed by the image projection onto different frames such as world frame, camera frame, image coordinates, and pixel coordinate. Then camera calibration will be discussed and intrinsic parameters of stereo camera are obtained from RQ factorization method. Depth perception from stereo camera is done by using identifying epipolar line and by arriving at disparity map, depth map, and finally the cross correlation. Then feature detection, feature description, and feature matching are essential in order to establish a robust image detection. The algorithms involved in each part will be discussed along with outlier rejection using RANSAC algorithm. In order to obtain a fine image detection, CNN will be used along with pooling layer and feature decoder to up sample the image. The output of semantic segmentation will be obliging for drivable space estimation, object detection, and distance-to-collision on environment.]]></description>
      <pubDate>Mon, 27 Apr 2026 15:01:26 GMT</pubDate>
      <guid>https://trid.trb.org/View/2581841</guid>
    </item>
    <item>
      <title>Use of a Monocular and Binocular Head-Worn Display in Lieu of a Head-Up Display During Approach, Landing, and Rollout: Human Factors Evaluation of Pilot Performance and Workload</title>
      <link>https://trid.trb.org/View/2689425</link>
      <description><![CDATA[The approach, landing, and rollout is a complex, critical operation for pilots of fixed-wing aircraft, particularly when flight visibility is limited by weather. To enhance the safety of this operation, aircraft can be equipped with a Head-Up Display (HUD), which presents flight symbology on a transparent screen at a focal distance of optical infinity so that the pilot can view primary flight information while maintaining visual contact with the runway. The Head-Worn Display (HWD) is an emerging technology that is designed to provide the benefits of a HUD. However, it may incorporate optical differences that impact pilots’ performance and workload. HWDs can be binocular (i.e., displaying symbology to both eyes) or monocular (i.e., displaying symbology to a single eye). When flying with a monocular HWD, binocular rivalry may impact the pilot’s ability to use the symbology and impose greater demands on the pilot’s attention. This raises questions about whether using a monocular HWD impacts pilots’ flying performance, elevates workload, and increases the risk of attentional tunneling. Pilot performance and workload may also be impacted by the physical and optical differences of the HWD relative to those of the HUD. To address these concerns, a study was carried out in which 24 pilot crews, each consisting of two Airline Transport Pilot (ATP) Captains, flew approach and landing scenarios with varying visibility levels, some of which included non-normal events, in a Boeing 737 Level D-equivalent flight simulator while using flight symbology presented on a HUD, binocular HWD, and monocular HWD. Simulator motion was disabled in the study to prevent interference with the HWD head tracking system. Quantitative measures of pilot flying performance were implemented to evaluate the effects of each display type on flightpath and energy management, landing and rollout performance, and response to non-normal events. Pilots rated their workload during each scenario using the National Aeronautics and Space Administration Task Load Index (NASA-TLX). The findings of this study suggest that a monocular HWD may not have a substantial impact on a pilot’s ability to manage the flightpath and energy state during approach, landing, and rollout operations. However, pilots experienced a higher workload when flying with the monocular HWD than with the binocular HWD and HUD. There were impacts on landing performance and runway incursion detection attributable to the optical characteristics of the HWD relative to those of the HUD, as well as the monocular versus binocular configuration of the HWD. Ultimately, this research contributes to the understanding of how visual attention is impacted by monocular viewing and provides operational takeaways for the use of an HWD in lieu of a HUD during low-visibility flight operations.]]></description>
      <pubDate>Thu, 16 Apr 2026 16:54:37 GMT</pubDate>
      <guid>https://trid.trb.org/View/2689425</guid>
    </item>
    <item>
      <title>Driving visual information in highway tunnel entrances: A computational method based on optical flow and color quantification</title>
      <link>https://trid.trb.org/View/2680652</link>
      <description><![CDATA[The environmental landscape of highway tunnel entrance zones is closely related to driving performance. To investigate the impact mechanism of environmental information volume on drivers’ visual workload in tunnel entrance zones, this study proposes a novel computational method for quantifying visual information. The aim is to provide a theoretical basis for improving tunnel entrance environments and enhancing driving safety. Field experiments on highways collected environmental images, vehicle dynamics, and drivers’ speed and psychological data from eight tunnel entrances. Visual field images were divided into five regions based on attention range: upper portal, central portal, left/right roadside, and pavement. HSV values were extracted to describe color and texture features. A model combining optical flow, sight distance, lane width, and speed quantified visual information volume, including traffic signs, and analyzed its relationship with visual workload. The subjective questionnaire results were consistent with the objective computational findings, verifying the reliability of the proposed method. Tunnel entrances with complex landscapes and diverse traffic signs exhibited higher levels of visual information, with drivers’ gaze distributed across four areas: both sides of the road, the tunnel entrance center, and the roadway. In contrast, entrances with simpler landscapes and fewer signs had lower visual information levels, and drivers’ gaze was mainly concentrated on the roadway and the tunnel entrance center. The proposed visual information quantification method effectively evaluates the impact of tunnel entrance environmental characteristics on driving visual workload. Appropriately controlling the proportion of traffic sign information (15%–25.55%) helps balance visual workload and comfort, while excessive or insufficient information may lead to discomfort due to underload or overload. These findings provide theoretical guidance and practical recommendations for optimizing tunnel entrance landscape design, traffic sign arrangement, and traffic safety enhancement.]]></description>
      <pubDate>Wed, 15 Apr 2026 10:29:29 GMT</pubDate>
      <guid>https://trid.trb.org/View/2680652</guid>
    </item>
    <item>
      <title>Do visual guiding facilities in freeway tunnels affect drivers’ perception of longitudinal safety distance? A simulation experiment</title>
      <link>https://trid.trb.org/View/2686637</link>
      <description><![CDATA[The safety perception of longitudinal distance by drivers in tunnels is critical for road safety. However, existing studies mainly focus on the effects of visual guiding facilities on speed perception and vehicle position, with limited research on their impact on longitudinal distance perception. This study aims to evaluate the effect of different types of visual guiding facilities on drivers’ safety perception of longitudinal distance in freeway tunnels. The experimental design considered the type of facility, including dot-shaped, linear, and ring-shaped visual guiding facilities. The linear visual guiding facilities are further categorized by length. The inter-facility spacing is treated as a variable parameter. Forty participants are involved in simulated experiments, during which subjective perceptions of longitudinal distance and perception reaction time data are collected. The impact of facility variables on safety perception of longitudinal distance is assessed. Dot-shaped visual guiding facilities do not improve drivers’ judgment of longitudinal distance. Ring-shaped visual guiding facilities have the most significant positive impact on drivers’ safety perception of longitudinal distance. The effectiveness of linear visual guiding facilities increases with their length; shorter lengths are less effective. To enhance drivers’ perception of longitudinal distance in tunnel mid-sections, retroreflective rings with a spacing of no more than 200 meters should be installed. Additionally, longer vertical retroreflective stripes can be used to supplement this, with spacing ranging from 50 to 100 meters.]]></description>
      <pubDate>Tue, 14 Apr 2026 16:59:47 GMT</pubDate>
      <guid>https://trid.trb.org/View/2686637</guid>
    </item>
    <item>
      <title>Assessment of cyclists' cognitive workload through eye tracker and EEG sensors: Sensitivity to individual and external factors in a real-world experiment</title>
      <link>https://trid.trb.org/View/2680139</link>
      <description><![CDATA[This exploratory study aims to assess cyclists' mental workload by collecting psychophysiological data, including eye-tracking and electroencephalography (EEG). A multi-factor, real-world experiment was conducted to correlate psychophysiological measures with kinematic riding features, objects in the cyclist's field of vision, and the surrounding road and urban context. Additionally, the study investigates whether subjective ratings align with objective measures.  Participants first completed a pre-questionnaire capturing demographic information and cycling frequency. Then, they rode an instrumented bicycle equipped with GNSS/INS sensors in real traffic conditions while wearing eye-tracking glasses and an EEG headset. This setup tracked the bicycle path and recorded gaze behavior and brain activity. After completing the route, participants provided segment-level ratings of mental workload using the NASA-TLX questionnaire.  The ten participants who provided useful data indicated that cyclists could be grouped based on their recorded mental states and the visual patterns identified along the route. Due to the complexity of the correlations and the heterogeneity of the data, machine learning was applied to investigate the relevance of different features in the variability of cognitive sensor measures. The eye tracker and EEG measures revealed individual factors influencing mental workload levels and showed evidence of common and differentiated sensitivity to factors related to objects in the field of view, spatial context, and kinematic riding behavior. Similarly, various levels of correlation were found between subjective and objective data when measuring mental workload.  This real-world pilot study assessed mental workload using objective psychophysiological data collected via sensors, offering insights into interpreting visual patterns and EEG indicators to support the selection of the most appropriate measures for further studies. Future research can use the proposed experimental design and methodological framework to validate and extend the results to a larger population and road and traffic conditions.]]></description>
      <pubDate>Wed, 08 Apr 2026 15:32:50 GMT</pubDate>
      <guid>https://trid.trb.org/View/2680139</guid>
    </item>
    <item>
      <title>Eye-tracking and visual processing tests for assessing driving ability in individuals with dementia and mild cognitive impairment: A pilot study</title>
      <link>https://trid.trb.org/View/2681652</link>
      <description><![CDATA[Changes in visual processing have been found to be affected in the early stages of dementia, potentially limiting driving ability. This pilot study investigated the sensitivity and specificity of eye-tracking, visual processing, and dementia screening tests in evaluating driving abilities among older drivers with and without cognitive impairment. Twenty-three participants aged 65+ years (n = 10 with cognitive impairment, 13 healthy controls) underwent dementia screening assessments including Mini Mental State Examination (MMSE) and Hopkins Verbal Learning Test (HVLT), a Visual Sensitivity Test (VST) and eye-tracking tasks (pro-saccade, anti-saccade, prospective eye movements) and compared these against a computerized driving-related hazard perception test (HPT) and self-report driving measures. Correlation analyses and ROC curves were used to explore relationships among the outcome measures. Drivers with cognitive impairment did not report different subjective driving performance, but had significantly lower HPT scores, with most scoring below the Driving and Vehicle Licensing Agency (DVLA) requirement for licensure. Eye-tracking data (n = 19) showed that drivers with cognitive impairment exhibited greater prosaccade latency variability. Antisaccade latency and prospective eye movement tests both correlated with self-reported in-vehicle task performance. The VST and HVLT tests strongly correlated with HPT scores and were highly predictive of scoring below the HPT DVLA cut-off scores. The VST and HVLT demonstrated high sensitivity and specificity for screening poor hazard perception performance in older drivers with cognitive impairment. Impaired eye movements correlated with self-reported difficulties in operating in-vehicle tasks, but not with HPT performance. Further research is needed to verify these findings in on-road assessments and with a larger sample size.]]></description>
      <pubDate>Wed, 08 Apr 2026 13:40:53 GMT</pubDate>
      <guid>https://trid.trb.org/View/2681652</guid>
    </item>
    <item>
      <title>Adapting Generic RGB-D Salient Object Detection for Specific Traffic Scenarios</title>
      <link>https://trid.trb.org/View/2591308</link>
      <description><![CDATA[Existing RGB-D salient object detection (SOD) models are primarily trained on general-purpose datasets, which may lead to domain shift issues when applied directly to new, specific scenes, such as stereo traffic datasets. Though “large-scale datasets (COME15K and ReDweb-S)” have been released, they only partially address the domain shift problem. From the perspective of data augmentation, this paper presents a novel solution, which follows a weakly-supervised way to adapt generic RGB-D SOD models for specific scenarios, with a focus on traffic scene imagery. Our key idea is to equip plain videos (specific scenarios, i.e., traffic scenes) with newly estimated saliency informative depth maps and pseudo-SOD GTs, enabling them to support the retraining of existing RGB-D SOD models for meeting the requirements of these specific scenes. To achieve this, we offer a fresh perspective on how depth information can be leveraged in the SOD task and introduce a new paradigm for extracting intrinsic information from optical flows derived from videos to refine RGB-D SOD models. Our method achieves a 1.2% improvement in F-measure on RGB-D datasets and a 27% enhancement on real-world street view datasets compared to baseline models. These results demonstrate the effectiveness of our approach in enhancing model adaptability for traffic scene imagery, even with limited target domain data. Codes, datasets, and results are available at https://github.com/MengkeSong/AGSS.]]></description>
      <pubDate>Mon, 30 Mar 2026 17:10:23 GMT</pubDate>
      <guid>https://trid.trb.org/View/2591308</guid>
    </item>
    <item>
      <title>Robust Vehicle Localization for Spherical Camera Models: Solution, Framework, and Verification</title>
      <link>https://trid.trb.org/View/2591302</link>
      <description><![CDATA[Vehicle visual localization uses vision sensors to capture environmental information, enabling precise localization of autonomous vehicles within their surroundings. However, current visual localization methods generally have some shortcomings: on one hand, they are limited by the camera’s field of view, on the other hand, their robustness is often inadequate under challenging conditions such as lighting changes, long-term scene changes, or occlusions. To address these issues, we formulate a general spherical camera model for both fisheye and panoramic cameras and propose a minimal solution for pose estimation using this model based on vehicle motion characteristic. The minimal solution cannot filter outliers, so a robust estimation framework is necessary. For outlier-rejection, we introduce two frameworks: a probabilistic optimal RANSAC and a globally optimal graph-based framework. We conduct a probabilistic analysis of the RANSAC to demonstrate its enhanced robustness given by the proposed minimal solution. To achieve robustness to extreme outliers (higher than 90%), we decouple the rotation and translation space through the minimal solution to construct maximum consensus graph for the two sub-problems. We then employs a maximum clique search algorithm to find the optimal solutions, achieving deterministic convergence while maintaining real-time performance. Extensive experiments with synthetic data, real-world fisheye images, and 360° panoramic images validate the robustness and efficiency of our proposed algorithms.]]></description>
      <pubDate>Mon, 30 Mar 2026 17:10:22 GMT</pubDate>
      <guid>https://trid.trb.org/View/2591302</guid>
    </item>
    <item>
      <title>OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird’s-Eye-View Vehicle Semantic Segmentation</title>
      <link>https://trid.trb.org/View/2591300</link>
      <description><![CDATA[Bird’s-eye-view (BEV) semantic segmentation is becoming crucial in autonomous driving systems. It realizes ego-vehicle surrounding environment perception by projecting 2D multi-view images into 3D world space. Recently, BEV segmentation has made notable progress, attributed to better view transformation modules, larger image encoders, or more temporal information. However, there are still two issues: 1) a lack of effective understanding and enhancement of BEV space features, particularly in accurately capturing long-distance environmental features and 2) recognizing fine details of target objects. To address these issues, we propose OE-BevSeg, an end-to-end multimodal framework that enhances BEV segmentation performance through global environment-aware perception and local target object enhancement. OE-BevSeg employs an environment-aware BEV compressor. Based on prior knowledge about the main composition of the BEV surrounding environment varying with the increase of distance intervals, long-sequence global modeling is utilized to improve the model’s understanding and perception of the environment. From the perspective of enriching target object information in segmentation results, we introduce the center-informed object enhancement module, using centerness information to supervise and guide the segmentation head, thereby enhancing segmentation performance from a local enhancement perspective. Additionally, we designed a multimodal fusion branch that integrates multi-view RGB image features with radar/LiDAR features, achieving significant performance improvements. Extensive experiments show that, whether in camera-only or multimodal fusion BEV segmentation tasks, our approach achieves state-of-the-art results by a large margin on the nuScenes dataset for vehicle segmentation, demonstrating superior applicability in the field of autonomous driving. Our code will be released at https://github.com/SunJ1025/OE-BevSeghttps://github.com/SunJ1025/OE-BevSeg.]]></description>
      <pubDate>Mon, 30 Mar 2026 17:10:22 GMT</pubDate>
      <guid>https://trid.trb.org/View/2591300</guid>
    </item>
    <item>
      <title>RoCalib: Large-Scale Autonomous Geo-Calibration for Roadside Lidar With High-Definition Map</title>
      <link>https://trid.trb.org/View/2610685</link>
      <description><![CDATA[To address the challenges of low overlap and viewing direction differences in large-scale roadside LiDAR calibration, this paper proposes RoCalib, a novel automatic roadside LiDAR calibration method based on the high-definition (HD) map. This method enables geographic registration of the LiDAR without any specific target and improves the efficiency and safety of the large-scale roadside facility calibration and maintenance. First, a novel virtual reprojection model is designed to construct a virtual mapping from the HD map to LiDAR, reducing representation differences. Based on this, a universal spatial context descriptor is introduced, applicable to various LiDAR systems, facilitating rapid retrieval of LiDAR positions within the HD map. Finally, based on the multi-feature optimization method considering the road structure, the fine registration and parameter calibration of the roadside LiDAR and the HD map are completed. The proposed framework is validated on simulated, public, and self-collected datasets, demonstrating that this method can automatically and accurately achieve multi-LiDAR geographic calibration, yielding superior performance.]]></description>
      <pubDate>Fri, 27 Mar 2026 17:03:28 GMT</pubDate>
      <guid>https://trid.trb.org/View/2610685</guid>
    </item>
    <item>
      <title>Hybrid Matching Teacher Framework for Cross-Domain Visual Detection Transformer</title>
      <link>https://trid.trb.org/View/2610671</link>
      <description><![CDATA[Object detection is a critical component of autonomous vehicle perception systems. However, domain shifts between training environments and real-world scenarios often degrade detector performance. Cross-domain object detection aims to adapt detectors to unlabeled target domains utilizing only labeled source data. Recent popular cross-domain object detection methods employ the mean teacher framework, which uses pseudo-labels generated by the teacher model to guide training on unlabeled real-world data. Despite its effectiveness, continuous training with noisy pseudo-labels leads to abnormal performance degradation in the later stages of training. To address this issue, we propose a novel Hybrid Matching Teacher (HMT) framework for cross-domain visual detection transformers, which enhances cross-domain knowledge transfer across pseudo-label generation, filtering, and training processes. Specifically, we design a Feature Sparse Alignment (FSA) module to adapt DETR tokens and queries, generate domain-adaptive weights to initialize the teacher-student models, and mitigate the inherent initial source bias in the teacher model. Next, a Localization-aware Pseudo-label Filtering (LPF) module ensures high-quality pseudo-labels by considering the consistency between localization and classification tasks. Furthermore, to improve the efficiency of pseudo-label training, the Cross-view Hybrid Matching (CHM) module introduces an auxiliary matching branch to increase the number of positive queries that match with pseudo-labels. Extensive experiments demonstrate that our approach achieves state-of-the-art performance, outperforming previous benchmarks by 3.1%, 8.5%, and 4.4% in adverse weather, diverse scenes, and synthetic-to-real, respectively.]]></description>
      <pubDate>Thu, 26 Mar 2026 17:02:25 GMT</pubDate>
      <guid>https://trid.trb.org/View/2610671</guid>
    </item>
    <item>
      <title>BEVMamba: Time Sequence Dense Bird’s-Eye-View Perception Modeling With State Space Model</title>
      <link>https://trid.trb.org/View/2610666</link>
      <description><![CDATA[BEV-based 3D perception with multi-frame images input is crucial for autonomous driving. However, current methods for temporal BEV perception fail to fully utilize long sequence features because of local fusion or high complexity. Recently, Mamba, a powerful temporal modeling network with linear complexity, has shown exceptional performance in various 2D vision tasks, but its application to 3D perception tasks remains unexplored. Therefore, this paper proposes a general BEV perception backbone named BEVMamba, which is the first work to leverage State Space Model for 3D perception. Built upon the BEVFormer, to adapt Mamba for 3D perception we first add Hybrid Positional Encoding to the BEV features, enabling the networks to be aware of their spatial-temporal position. In the Temporal SSM block, the proposed 3D Factorized Scan ensures that historical BEV features are enriched with global temporal-spatial information. Subsequently, the Spatial-Temporal Corridor Fusion aggregates all BEV features in a physically meaningful manner, achieving precise feature fusion. The reliable BEV features obtained by BEVMamba are used for various perception tasks, including 3D object detection and 3D occupancy prediction. Results on the nuScenes and Occ-3D nuScenes datasets show that BEVMamba outperforms its baseline BEVFormer in both dense and sparse perception tasks and demonstrates competitive performance compared to other methods, highlighting the potential of Mamba in 3D perception tasks. The code will be available at https://github.com/Liuxiaoaaa/bevmamba]]></description>
      <pubDate>Thu, 26 Mar 2026 17:02:25 GMT</pubDate>
      <guid>https://trid.trb.org/View/2610666</guid>
    </item>
  </channel>
</rss>