2

We have transcripts of around 8000 videos. However, these transcripts do not have any timed text (ie: they are not time coded to the video)

We are trying to ascertain whether software exists, or what approach to take to automatically time code the transcript to its associated video.

Does software exist to do this?

If not, what software approach would you use to accomplish this feat?

Walter
  • 16,158
  • 8
  • 58
  • 95

2 Answers2

3

I'm not aware of any completely automatic software solution, but this Paper (SyncTS: Automatic synchronization of speech and text documents) gives a possible approach.

ABSTRACT

In this paper, we present an automatic approach for aligning speech signals to corresponding text documents. For this sake, we propose to first use text-to-speech synthesis (TTS) to obtain a speech signal from the textual representation. Subsequently, both speech signals are transformed to sequences of audio features which are then time-aligned using a variant of greedy dynamic time-warping (DTW). The proposed approach is both efficient (with linear running time), computationally simple, and does not rely on a prior training phase as it is necessary when using HMM-based approaches. It benefits from the combination of a) a novel type of speech feature, being correlated to the phonetic progression of speech, b) a greedy left-to-right variant of DTW, and c) the TTS-based approach for creating a feature representation from the input text documents. The feasibility of the proposed method is demonstrated in several experiments.

psychowood
  • 131
  • 3
  • 1
    Links frequently die which would make this answer fairly useless. Please summarize the possible approach in your answer. – Walter Feb 03 '13 at 14:12
0

Adobe Premiere Pro will allow you to attach transcripts to video clips and will attempt to align them. It does a text-to-speech on the video using the transcript as training material. It's better than nothing..... Unfortunately, you do this manually and with 8k clips that would not be very pleasant.

Alan Shutko
  • 1,380
  • 1
  • 10
  • 9