Google is now using speech tot ext conversion. That means that they take the SPOKEN text from your videos, and convert it to text.
While they won’t tell us WHY they do that, the obvious answer is that they are looking for RELEVANT CONTENT! They now use your spoken text to categorize and rank your video.

