Question

Machine Learning Kit: Preprocessing inputs

0

I’m trying out the ML Kit and I found a model that I imported successfully as an ML Model Mapping into mendix 10: The BERT-squad. It’s a simple one that is supposed to take 1 paragraph, and multiple questions, then answer those questions about the paragraph. https://github.com/onnx/models/tree/main/text/machine_comprehension/bert-squad I got the model to import with no errors. However the inputs aren’t quite what I expected. You simply can’t pass text. This is what the inputs looks like in the mapping. And when I create an input object these are the attributes I need to assign: I’m not exactly sure how to process these questions and paragraph and assign them into these attributes. The above link gives an example of a paragraph and three questions saved in a json file, and then gives the following code to process it: # preprocess input predict_file = 'inputs.json' # Use read_squad_examples method from run_onnx_squad to read the input file eval_examples = read_squad_examples(input_file=predict_file) max_seq_length = 256 doc_stride = 128 max_query_length = 64 batch_size = 1 n_best_size = 20 max_answer_length = 30 vocab_file = os.path.join('uncased_L-12_H-768_A-12', 'vocab.txt') tokenizer = tokenization.FullTokenizer(vocab_file=vocab_file, do_lower_case=True) # Use convert_examples_to_features method from run_onnx_squad to get parameters from the input input_ids, input_mask, segment_ids, extra_data = convert_examples_to_features(eval_examples, tokenizer, max_seq_length, doc_stride, max_query_length) I’m not exactly sure how I would accomplish this code in a Microflow in mendix. The code refers to some functions such as “read_squad_examples”and “convert_examples_to_features” and I’m not exactly sure how to utilize these functions. Is this something I would have to crack open Eclipse and create a custom java action?

asked 2023-07-12

Brian Lorraine

2 answers

Brian Lorraine · Answer 1 · 2023-08-07

To anyone who reads this, The Mendix ML TookKit has a *LONG* way to go before it becomes useful. I tried out the example project in the answers above, only to get tons of errors in many of these examples. The BERT one DID work, but it does a poor job of answering even the most basic questions with straight forward context.

Most models you import will have missing tensor data and there is no way to know what to put in it. Even if you do, you can’t just feed the inputs into the model directly and read directly from it. You have to write a large amount of java code to pre/post process the text which is NOT straightforward at all. This pretty much defeats the purpose of using Mendix in the first place.

I wouldn’t recommend the ML Toolkit to anyone right now. It has some potential, but it isn’t going to be useful to anyone until there’s some sort of built in pre/post processing modules as a part of the toolkit.

Esteban Siravegna · Answer 2 · 2023-07-18

Hey Brian. We created a demo Mendix app that illustrates several use cases for various types of ML models. Bert is among these, and you have there examples of implementing java pre and post processors for that version of BERT specifically, please take a look:
GitHub - mendix/mlkit-example-app: Demo for Mendix MLKit.

Good luck!