<rss version="2.0" xmlns:atom="https://www.w3.org/2005/Atom">
  <channel>
    <title>Transport Research International Documentation (TRID)</title>
    <link>https://trid.trb.org/</link>
    <atom:link href="https://trid.trb.org/Record/RSS?s=PHNlYXJjaD48cGFyYW1zPjxwYXJhbSBuYW1lPSJkYXRlaW4iIHZhbHVlPSJhbGwiIC8+PHBhcmFtIG5hbWU9InN1YmplY3Rsb2dpYyIgdmFsdWU9Im9yIiAvPjxwYXJhbSBuYW1lPSJ0ZXJtc2xvZ2ljIiB2YWx1ZT0ib3IiIC8+PHBhcmFtIG5hbWU9ImxvY2F0aW9uIiB2YWx1ZT0iMCIgLz48L3BhcmFtcz48ZmlsdGVycz48ZmlsdGVyIGZpZWxkPSJpbmRleHRlcm1zIiB2YWx1ZT0iJnF1b3Q7U3BlZWNoIHN5bnRoZXNpcyZxdW90OyIgb3JpZ2luYWxfdmFsdWU9IiZxdW90O1NwZWVjaCBzeW50aGVzaXMmcXVvdDsiIC8+PC9maWx0ZXJzPjxyYW5nZXMgLz48c29ydHM+PHNvcnQgZmllbGQ9InB1Ymxpc2hlZCIgb3JkZXI9ImRlc2MiIC8+PC9zb3J0cz48cGVyc2lzdHM+PHBlcnNpc3QgbmFtZT0icmFuZ2V0eXBlIiB2YWx1ZT0icHVibGlzaGVkZGF0ZSIgLz48L3BlcnNpc3RzPjwvc2VhcmNoPg==" rel="self" type="application/rss+xml" />
    <description></description>
    <language>en-us</language>
    <copyright>Copyright © 2026. National Academy of Sciences. All rights reserved.</copyright>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <managingEditor>tris-trb@nas.edu (Bill McLeod)</managingEditor>
    <webMaster>tris-trb@nas.edu (Bill McLeod)</webMaster>
    <image>
      <title>Transport Research International Documentation (TRID)</title>
      <url>https://trid.trb.org/Images/PageHeader-wTitle.jpg</url>
      <link>https://trid.trb.org/</link>
    </image>
    <item>
      <title>Refining maritime Automatic Speech Recognition by leveraging synthetic speech</title>
      <link>https://trid.trb.org/View/2417449</link>
      <description><![CDATA[Maritime transport serves as a critical component of global trade and logistics, enabling the movement of goods and resources across oceans and waterways. Especially in busy waterways and ports, effective and accurate communication is essential, as it ensures the seamless exchange of information and the coordinated execution of port activities. However, comprehensibility is often hindered by factors such as poor audio quality, background noise, and diverse languages and accents. Automatic Speech Recognition (ASR) systems can mitigate these issues by providing real-time transcription and enabling the implementation of automated, value-adding services to enhance situational awareness. While pre-trained ASR models excel on general speech, maritime ASR faces unique challenges due to a lack of annotated data, diverse accents, and specialized terminology. To this end, the authors focus on improving the transcription quality of pre-trained ASR models for maritime communication with a particular focus on accurately recognizing maritime-specific terminology such as vessel and location names. Due to the scarcity of transcribed maritime communication, the authors create a synthetic training dataset tailored to regional maritime terminology. The synthetic audio is augmented with general human speech and used to fine-tune an end-to-end ASR model under various settings. The evaluation of the models employs a proprietary dataset of regional maritime radio communication from the port of Hamburg. The experimental results demonstrate a notable enhancement in ASR performance. Specifically, the authors' approach yields an absolute improvement over the pre-trained baseline of 13.46% Word-Error-Rate and an increase of 41.57% recall for vessel names and 38.65% recall for locations. The authors' findings underscore the efficacy of integrating synthetic training data to address the challenges encountered in maritime ASR, paving the way for more robust and accurate speech recognition systems tailored to maritime applications.]]></description>
      <pubDate>Thu, 22 Aug 2024 15:09:39 GMT</pubDate>
      <guid>https://trid.trb.org/View/2417449</guid>
    </item>
    <item>
      <title>VHF Speech Enhancement Based on Transformer</title>
      <link>https://trid.trb.org/View/2031647</link>
      <description><![CDATA[To solve the poor quality of Very high frequency (VHF) speech communication in the navigation field, a VHF speech enhancement model based on an improved transformer (VHFSE) is proposed in this paper. The long-term and short-term noise are the reasons for the poor quality of VHF voice communication. VHFSE can reduce these two aspects of noise. The authors select the Two-stage Transformer based Neural Network (TSTNN) as the baseline. The Transformer structure pays attention to global information and parallel computing, which can reduce the long-term noise. In order to strengthen the ability of the model to reduce short-term noise, they add CNN module to the transformer according to the ability of revolutionary neural networks (CNN) to extract local information. Meanwhile, to improve the real-time performance, this study employs the lightweight convolution module (Depthwise Separable Convolution) to efficiency of VHF speech communication. Experimental results show that the proposed model VHFSE obtains the highest PESQ and STOI values than other compared modules. Besides, they apply the self-built dataset in their proposed model. The spectrum diagram shows that their model has the best enhancement effect on navigation VHF speech.]]></description>
      <pubDate>Wed, 30 Nov 2022 10:59:54 GMT</pubDate>
      <guid>https://trid.trb.org/View/2031647</guid>
    </item>
    <item>
      <title>NextGen Flight Deck Data Comm: Auxiliary Synthetic Speech Phase II</title>
      <link>https://trid.trb.org/View/1370430</link>
      <description><![CDATA[Data Comm—a text-based controller-pilot communication system—is expected to yield several Next Generation Air Transportation System (NextGen) safety and efficiency benefits. With Data Comm, communication becomes a visual task, and may potentially increase head-down time on the flight deck as crews interact with the display. This study examined the feasibility of supplementing Data Comm with synthetic speech in commercial, en-route operations. To this end, 32 air-transport pilots (16 flightcrews) flew two experimental conditions in a Boeing 737-800 fixed-base simulator. In one condition, Data Comm was implemented with a text-only display, and, in the other it was implemented with a text display and synthetic speech that annunciated each message (text+speech). Results indicated that the text+speech display aided the performance of flightcrews compared to text only, without introducing additional complications. Relative to the text-only display, the text+speech display yielded less head-down time. Flightcrews did not delay opening or acknowledging a text+speech message when the party line was active. The majority of pilots reported that the text+speech display was easy to use, helpful, and not distracting; however, this acceptance was attenuated in major-airline pilots. Taken together, these results provide preliminary guidance for aircraft certification regarding the use and implementation of synthetic speech on the flight deck.]]></description>
      <pubDate>Wed, 30 Sep 2015 09:07:13 GMT</pubDate>
      <guid>https://trid.trb.org/View/1370430</guid>
    </item>
    <item>
      <title>Intuitive Speech Interface for Vehicle Information Systems</title>
      <link>https://trid.trb.org/View/1276839</link>
      <description><![CDATA[In this paper the authors proposed an intuitive speech interface with which a user can retrieve information through the speech dialogue. If the user says a keyword that the system has already read aloud, this interface returns information corresponding to the keyword. The authors system including this interface contains a dynamic dictionary and response control. From the dialogue, the system analyzes retrieved information using morphological analysis, and dynamically generates a recognition dictionary based on the morpheme and pronunciation of the words. The dynamic dictionary realizes the recognition of words that the system reads aloud. Also, response control obtains information from speech recognition results and previously spoken topics even if a part of the user’s spoken word is unclear. Because the user only repeats words that are heard, this interface can be used in eyes-free or hands-free situations. The proposed method was implemented on the system and the authors found that the task accomplishment rate was 96.1% for the condition where the SNR was 6.6 dB, which is the assumed noise in the car.]]></description>
      <pubDate>Thu, 21 Nov 2013 09:14:42 GMT</pubDate>
      <guid>https://trid.trb.org/View/1276839</guid>
    </item>
    <item>
      <title>NextGen Flight Deck Data Comm: Auxiliary Synthetic Speech Phase I</title>
      <link>https://trid.trb.org/View/1259493</link>
      <description><![CDATA[Data Comm—a text-based controller-pilot communication system—is critical to many Next Generation Air Transportation System (NextGen) improvements. With Data Comm, communication becomes a visual task. Interacting with a visual Data Comm display may yield an unsafe increase in head-down time, particularly for single-pilot operations. This study examined the feasibility of supplementing Data Comm with synthetic speech. To this end, 32 pilots flew two experimental scenarios in a Cessna 172 Flight Training Device. In one scenario, air traffic control (ATC) communication was with a text-only Data Comm display, in the other, communication was with a text Data Comm display with synthetic speech that read aloud each message (i.e., text+speech). Pilots heard traffic with similar call signs on the party line and received a conditional clearance (in both scenarios); in either scenario, pilots received a clearance that was countermanded by a live controller. Results indicated that relative to the text-only display, the text+speech display aided single-pilot performance by reducing head-down time, and may have prevented participants from acting early on the conditional clearance. Supplementing text Data Comm with speech did not introduce additional complications: participants were neither more likely to erroneously respond to similar call signs, nor to ignore a live ATC voice countermand.]]></description>
      <pubDate>Tue, 03 Sep 2013 12:24:23 GMT</pubDate>
      <guid>https://trid.trb.org/View/1259493</guid>
    </item>
    <item>
      <title>Speaking the same language</title>
      <link>https://trid.trb.org/View/1085410</link>
      <description><![CDATA[Subtitle: Phoneme-based technology makes passenger announcements clearer.]]></description>
      <pubDate>Thu, 30 Dec 2010 10:18:39 GMT</pubDate>
      <guid>https://trid.trb.org/View/1085410</guid>
    </item>
    <item>
      <title>Intelligent Scheduling &amp; Dispatching In An ITS-Enabled World</title>
      <link>https://trid.trb.org/View/903102</link>
      <description><![CDATA[Para-transit trips are scheduled, and on day of service, dispatched in response to changes in the trip set and unpredictable traffic conditions. With AVL/MDC on vehicles, ITS data flow represents 20-20 vision eyes for dispatchers to “see” the world in real-time, and powerful voice for dispatchers to “tell” drivers about potentially drastic and massive changes to their work in real-time. Prior polling of drivers via voice radio are analogous to blurred vision and hoarseness, limiting the scope of dispatching actions. For Transit Authorities of some size, automatic scheduling and dispatching systems are crucial to help dispatchers as they schedule requests and move trips around to maintain efficient fleet usage and honor all trip commitments. But with ITS, more is now possible. We present requirements to new, intelligent dispatching options in the ITS-enabled world, and how these requirements are being met by Adept, StrataGen’s automatic scheduling and dispatching system.]]></description>
      <pubDate>Tue, 17 Nov 2009 14:59:21 GMT</pubDate>
      <guid>https://trid.trb.org/View/903102</guid>
    </item>
    <item>
      <title>Performing E-Mail Tasks while Driving: the Impact of Speech-Based Tasks on Visual Detection</title>
      <link>https://trid.trb.org/View/763278</link>
      <description><![CDATA[Drivers listened and responded to e-mail messages presented in a human voice and two types of synthetic speech (concatenative and formant) while driving a simulator. Their performance for visual event detection, vehicle control, and message responses was assessed. Results indicated that the type of speech output system affected drivers’ detection of visual changes in the driving environment; they were poorer at detecting these events when either of the synthetic speech systems was used. Drivers detected fewer visual changes during the difficult messages than during the baseline driving. No effects of the speech system type or e-mail message difficulty were observed on the vehicle control measures. Drivers were also less accurate when responding to message content for messages presented in synthetic speech (concatenative) compared with recorded human voice. Subjective ratings indicated that listening to the synthetic speech required more mental effort than listening to the recorded human voice. Preference ratings for the interfaces decreased as mental effort increased. The results indicated that although drivers were not required to direct their attention away from the road, using the speech-based interfaces reduced drivers’ visual event detection and their response accuracy to messages themselves.]]></description>
      <pubDate>Thu, 03 Nov 2005 14:12:19 GMT</pubDate>
      <guid>https://trid.trb.org/View/763278</guid>
    </item>
    <item>
      <title>THE INFLUENCE OF IN-VEHICLE NOISE ON SPEECH RECOGNITION FOR AUTOMOTIVE VOICE-ACTIVATED CONTROL SYSTEMS</title>
      <link>https://trid.trb.org/View/747065</link>
      <description><![CDATA[In this paper, the author focuses on research defining speech recognition performance thresholds to in-vehicle noise for a typical in-vehicle voice-activated speech recognition (VSR) system. By identifying the in-vehicle noise parameters that compromise VSR for both male and female speakers, it should be possible to identify the primary parameters that cause breakdown in VSR performance in vehicles.]]></description>
      <pubDate>Thu, 06 Jan 2005 00:00:00 GMT</pubDate>
      <guid>https://trid.trb.org/View/747065</guid>
    </item>
    <item>
      <title>SPEECH-PROCESSING: SNCF APPLICATIONS</title>
      <link>https://trid.trb.org/View/277519</link>
      <description><![CDATA[This article which analyses speech processing and its applications on SNCF, examines in turn speech synthesis (installed systems:  passenger announcements in stations, train phones) and speech recognition (installed systems: telephone call speech composers for regulators:  electronic secretary).]]></description>
      <pubDate>Sat, 28 Aug 2004 04:47:57 GMT</pubDate>
      <guid>https://trid.trb.org/View/277519</guid>
    </item>
    <item>
      <title>COMPACT DISCS TAKE ON DATA STORAGE</title>
      <link>https://trid.trb.org/View/218206</link>
      <description><![CDATA[The Philips Car Information and Navigation System (CARIN) stores a data base of maps, allowing a computer in a car to plan the most efficient route across a town.  It informs the driver where to turn at every road junction.  The information is relayed to the driver through a speech synthesis unit.  As prototype a minibus in Eindhoven was equipped with a computer, a compact disc player and display screen.  When the compact disc is not being used for directions, it can play music for the passengers.  (TRRL)]]></description>
      <pubDate>Wed, 25 Aug 2004 02:43:51 GMT</pubDate>
      <guid>https://trid.trb.org/View/218206</guid>
    </item>
    <item>
      <title>FACTORS THAT INFLUENCE INTELLIGIBILITY IN MULTITALKER SPEECH DISPLAYS</title>
      <link>https://trid.trb.org/View/704730</link>
      <description><![CDATA[A well-designed multitalker speech display could improve the aviation safety and reduce operator workload.  This paper presents the results of a number of experiments that have used the coordinate response measure to examine the impact of four monaural design parameters on overall intelligibility in a multitalker communications task.  The authors also present the results of a new experiment that has used the same procedure to examine the influence of two additional factors in binaural speech displays:  (a) the apparent spatial locations of the talkers and (b) the listener's a priori information about the listening task.  Findings suggest that the most efficient way to improve the effectiveness of a multitalker speech display is to use virtual synthesis technique to separate the locations of the competing talkers.]]></description>
      <pubDate>Sun, 01 Aug 2004 00:00:00 GMT</pubDate>
      <guid>https://trid.trb.org/View/704730</guid>
    </item>
    <item>
      <title>SPOKEN DIALOGUE TECHNOLOGIES FOR DRIVERS</title>
      <link>https://trid.trb.org/View/700955</link>
      <description><![CDATA[This paper offers a detailed overview of spoken dialogue technologies for drivers, including a history of spoken language technologies current applications and challenges inherent in using this technology in vehicles in the future. The authors describe this technology as having several components:  speech recognition, language understanding, dialogue management, language generation and speech synthesis. They also describe the possible applications for drivers and the range of challenges to system developers. The authors draw multiple conclusions, from which they derive suggestions for those designing and marketing products for the future.]]></description>
      <pubDate>Tue, 04 May 2004 00:00:00 GMT</pubDate>
      <guid>https://trid.trb.org/View/700955</guid>
    </item>
    <item>
      <title>DESTINATION ENTRY WHILE DRIVING: SPEECH RECOGNITION VERSUS A TOUCH- SCREEN KEYBOARD</title>
      <link>https://trid.trb.org/View/724116</link>
      <description><![CDATA[This report describes a study which studied the effect of several destination-entry methods on driving performance. In the study participants drove a simulator on roads with curves of several radii while using one of three methods for entering addresses on a navigation system. The methods included: touch-screen keyboard typing speech recognition - character spelling, or speech recognition - word dictation. The study recorded and analyzed speed and accuracy of address entry, detailed measures of driving performance, and subject ratings of difficulty and safety.]]></description>
      <pubDate>Thu, 02 Oct 2003 00:00:00 GMT</pubDate>
      <guid>https://trid.trb.org/View/724116</guid>
    </item>
    <item>
      <title>COMMUNICATIONS NETWORK FOR AUTOMOTIVE USE : THE CAR.NET</title>
      <link>https://trid.trb.org/View/664456</link>
      <description><![CDATA[This paper describes the Car. NET strategy, a mobile computing system providing information and communication capabilities to drivers. It also gives an example of a non-distracting telematics application by describing a speech recognition-based navigation system.]]></description>
      <pubDate>Thu, 02 Oct 2003 00:00:00 GMT</pubDate>
      <guid>https://trid.trb.org/View/664456</guid>
    </item>
  </channel>
</rss>