You could instead use getElementsByClassName since it does have a class. Note that instead of a single element, it gives you a list of elements, so afterwards you'll have to fetch the first element like below:
const myClassElements = document.getElementsByClassName('my-class');
const firstElement = myClassElements[0]; // Access the first element
You'll then have to see if the speech to text element's content can be accessed the same way or if you need to change something there, but this should at least allow you to select the right element.