1 idea is to use the browser APIs like below example.
let msg = new SpeechSynthesisUtterance();
const voices = window.speechSynthesis.getVoices();
const lang1="ar"
msg.voice =voices[2];
msg.lang=lang1;
msg.text = "يرجى نسخ الجملة التالية أدناه";
window.speechSynthesis.speak(msg);
You can put this in a JS Action which you can trigger used a Nanoflow.
Of course this is not a full solution. but you can use the browser APIs to achieve this.