That's cheap. I use to do manual human transcription as a side gig. That stuff is not easy, at all. The quality of the audio varies massively, especially from meetings and it can be very difficult to interpret sometimes. I imagine computational translation of a single short audio bite is magnitudes better at human translation these days if they have enough data to work with. It would be interesting to see how well the product works on some of the tougher audio sound bites.
For a service like this it should be a piece of cake if it's always just one person talking, directly into a mic in a quiet room, composing a message to submit as text online.