Using JSOUP to login to Mendix and scrape webpage

0
Hi, I’m trying to use JSOUP to log into Sprintr and fetch information from a project’s page, but I’m stuck on the login process. Any hints how to make JSOUP login and create the necessary cookies to be able to view the Sprintr project page? https://stackoverflow.com/questions/41209009/jsoup-automatically-login-to-a-website?noredirect=1   Connection.Response loginForm = Jsoup.connect("https://backoffice.holidayinsider.com/backoffice2/login") .method(Connection.Method.GET) .execute(); Connection.Response mainPage = Jsoup.connect("https://backoffice.holidayinsider.com/backoffice2/login") .data("username", "myusername") .data("password", "mypass") .cookies(loginForm.cookies()) .followRedirects(true) .method(Connection.Method.POST).execute(); System.out.println(mainPage.parse()); Map<String, String> cookies = mainPage.cookies(); Document evaluationPage = Jsoup.connect("https://backoffice.holidayinsider.com/backoffice2/") .cookies(cookies) .get(); System.out.println(evaluationPage); So any experience doing something like this that someone can share? So for now I see the following flow (traced with dev view in browser): Say I want to access the page: https://sprintr.home.mendix.com/link/myapps Enter https://sprintr.home.mendix.com/link/myapps Get redirected to https://sprintr.home.mendix.com/openid/login?continuation=link/myapps Get redirected to https://login.mendix.com/oauth/authorize (lots of parameters) POST payload to https://login.mendix.com/oidp/dispatch  (username, password, login_method, resource_id, idp_hint Get redirected to https://sprintr.home.mendix.com/openid/callback (lots of parameters) Get redirected to https://sprintr.home.mendix.com/link/myapps Get redirected to https://sprintr.home.mendix.com/index.html   So how to get the parameters required for steps 3 and 5 (if needed). How to get the resource_id in step 4 (if needed). And how would the request flow be in JSOUP to get this working?  
asked
2 answers
2

I don’t have experience with Jsoup, a quick search reveals it does html parsing. Not interprete javascripts. Mendix apps are SPA (Single page applications) that require javascript to work. What you are trying to do will not work unless you invest a huge amount of time in it. You have better chance with tooling like Selenium which runs automation inside of a real webbrowser.

If all you want is to get your list of apps, there’s a much easier way:

https://docs.mendix.com/apidocs-mxsdk/apidocs/projects-api

Or look for other SDK API’s

 

answered
0

After some playing around, I got the login portion working, but like you said, the result is not complete… I’ll try the selenium route (saw there are java libraries as well). Thanks for the suggestion!

answered