Azure ML Investigations
A customer where I work, suddenly had an issue where the machine learning model that they had deployed to AKS stopped working after having been deployed over 1 year ago. They asked if someone could take a look into it, I always say jump at the chance to learn something new and perhaps blog (I a,ways forget to blog so here we are).
Now I have never really done anything with Azure ML before, I have clicked around the portal for a few minutes and thats it, so here is how I went about it, just incase you have to do the same thing at some point.
An email from the customer show me the error they are receiving which says “Internal Server Error. Run: Server internal error is from Module Execute Python Script”
Ok, so straight away im thinking something to do with Python , lets try tro replicate the error first of all (always my first port of call when debugging isues) – so I open up Postman which I can use to test the Rest call and send a post to the URI and yeah I see the same issue. The customer has mentioned AKS so I look into that, all appears to be fine there.
Time to crack open the Azure Machine Learning studio and go on the hunt for anything that might look interesting. Now I know the name of the model from the customer so I start poking around there and I click on the test and test it and I get this error with a stack trace still mentioning an internal server error due to what looks like Python using version 3.6.
The code hadnt been changed or redeployed for over 1 year but suddenly stops working and I’m thinking thats odd, must be some sort of dependency being pulled in when its running. After a fair bit of googling I came across this link – Python SDK release notes – Azure Machine Learning | Microsoft Docs which after carefully reading says that there is a breaking change with the Azure Machine Learning SDK for Python version 1.41.0.
Now, unfortunately I spend a fair bit of time looking for this, trying to figure out how, or even where to update this in the model, with absolutely no luck whatsoever.
I then ended up on a call with a super helpful Microsoft Engineer who knows Azure ML inside out and after we discuss the issue we figure out between us that the dependency on a package called VowalRabbitt is causing the issue – the latest version of this package required Python newer than 3.6. So we figure lets try pinning the version of the package in the designer to use VowalRabbitt 8.10.1 – rebuilt the model and bish bash bosh it all works again like a charm.
The very best part of this was that I reached out on Twitter and 3 different very kind people asked if they could help. I wanted to say a huge big thank you to Kevin Oliver (@TechnicalPanda), Pedro Fiadeiro (@plfiadeiro) and also Sammy Deprez (@sammydeprez) – without your help I would have been very stuck so many thanks to you all!.
In summary I learned a lot about Azure ML, how models are used and tested and then deployed, I went down many a rabbit hole, pun intended and eventually came up trumps – I knew next to nothing about Azure ML before this came up, I said I would fix it and I never gave up. This is what I love doing, fixing stuff I know nothing about by asking questions and finding out the answers.
In the unlikely event that this helps anyone – awesome, if not thanks for reading.
If yuou have questions reach out to me here in the comments below or on twitter.
Don’t forget to subscribe to myYouTube Channel.