prev Reflection on the use of Whisper

I passed 3h44 recordings to Whisper API.

  • Speech to text - OpenAI API python whisper.py 0.37s user 0.18s system 0% cpu 10:00.79 total The second half was just noise, so it was actually two hours. image

Try it with a one-hour study session audio python whisper.py 0.34s user 0.14s system 0% cpu 3:27.50 total image Unlike the previous example, this one talks for almost the entire hour. One hour of audio, with a processing time of 3.5 minutes and a cost of 40 cents. I had Claude summarize the resulting transcriptions. - Transcription and AI summary of the audio from the study group

Nothing hard about the code. python

audio_file = open(audio_path, "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1", file=audio_file
)
print(transcription.text)
with open(f"whisper_out/{indir}_{audio}.txt", "w") as f:
    f.write(transcription.text)

The most time-consuming part was installing ffmpeg and starting to put in gcc and stuff.

  • $ brew install ffmpeg
  • Note that this is for audio file splitting, so it is not necessary if the audio file is less than 25MB.
  • The audio of the above hour-long talking session is 15MB, so if you’re using a recording app that chops it up in an hour, you don’t need it.

This page is auto-translated from /nishio/Whisper2024-03-25 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.