prev Reflection on the use of Whisper
I passed 3h44 recordings to Whisper API.
- Speech to text - OpenAI API
python whisper.py 0.37s user 0.18s system 0% cpu 10:00.79 total
The second half was just noise, so it was actually two hours.
Try it with a one-hour study session audio
python whisper.py 0.34s user 0.14s system 0% cpu 3:27.50 total
Unlike the previous example, this one talks for almost the entire hour.
One hour of audio, with a processing time of 3.5 minutes and a cost of 40 cents.
I had Claude summarize the resulting transcriptions.
- Transcription and AI summary of the audio from the study group
Nothing hard about the code. python
audio_file = open(audio_path, "rb")
transcription = client.audio.transcriptions.create(
model="whisper-1", file=audio_file
)
print(transcription.text)
with open(f"whisper_out/{indir}_{audio}.txt", "w") as f:
f.write(transcription.text)
The most time-consuming part was installing ffmpeg and starting to put in gcc and stuff.
$ brew install ffmpeg
- Note that this is for audio file splitting, so it is not necessary if the audio file is less than 25MB.
- The audio of the above hour-long talking session is 15MB, so if you’re using a recording app that chops it up in an hour, you don’t need it.
This page is auto-translated from /nishio/Whisper2024-03-25 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.