UNSW trawls cloud-based data lakes for insight and value, explores AI analytics
Australia’s higher education sector has never faced a more difficult year. In the space of just six months the COVID-19 pandemic forced universities to pivot to a digital learning model with international students offshore and domestic students at home; they are also facing the need for a strategy rethink following the Federal Government’s recently announced funding changes
There isn’t an institution in the land that isn’t performing some form of “what if” scenario planning.
The University of NSW (UNSW) is no exception, but thanks to some extremely fortuitous timing it has access to high quality data to support its deliberations.
By mid-2019 UNSW had, under the leadership of Kate Carruthers, its Chief Data and Insights Officer and a senior lecturer in its School of Computer Science and Engineering, built its first Microsoft Azure based data lake. It has since created a second data lake – again in Azure, but built using Databricks as a curated data collection.
This two-lake approach means that the raw data – for example, the codes associated with offers to university students are all held in the first data lake, but the second one stores the curated data – which adds context and shows the offers themselves. It’s that form of data that is most useful for report building, as people don’t have to first interpret the raw data to extract value from it.
Named recently as one of the Global Top 100 Data Visionaries for 2020, Carruthers has championed a cloud-based approach to data management and at UNSW democratised properly controlled access to that data in order to help people across the university do their jobs.
People in different functional roles across the university are now using Power BI and accessing data from the data lake to build the reports that they need to do their job. It’s a good start – but for a visionary like Carruthers it is only the start.
“I want us to be able to talk to our data. I want us to be able to, like in Star Trek, ask ‘how many students do I have’ and have the computer understand that and tell us the answer. What we’re building is basically the building blocks to get to that point.
“Voice recognition is coming along so fast that I think that that’s going to be a huge area for innovation.”
Microsoft caught up with Carruthers recently to explore both her vision for the future and how data is transforming UNSW today.
MS: What does the higher ed landscape look like at present?
KC: There’s been a great deal of uncertainty for the higher education sector in general. And most Australian universities have acknowledged fairly high financial losses due to COVID. So just getting people accurate and timely data about what’s happening with things like that was really fundamental. Luckily, we started this last year.
MS: Your own team must have faced quite a jolt when COVID-19 struck?
KC: Very luckily for us, we were able to move seamlessly online because we’re using Microsoft DevOps for our data ops. When the order came to go home, we just kept working. All we did was move our stand-up into a Teams channel and that was the only change we had to make in how we work.
MS: How are you organised?
KC: There’s less than 10 people in our team and we want to deliver as much value to as many people across the organisation as possible in one go. We don’t want to solve one person’s problem. we want to solve many people’s problems all at once.
We work in an agile method. We use Scrum, and we work in two-week chunks, and we use Microsoft DevOps to drive that. We’re using Microsoft DevOps for our data ops now. We’re managing multiple environments and we bake in cost optimisations. Every month we look at what we’re doing and look at how we can cost optimise. Because with cloud you can very easily let your costs get out of control. We prioritise on the basis of what will deliver the most value to as many people as possible.
MS: How do you democratise data access?
KC: Fundamental to my data strategy was to turn my team into the data engineers and have the domain specialists embedded in the business. We’ve been working with our colleagues in HR – so when we had to suddenly all go off campus our colleagues in HR developed a number of Power BI dashboards that they made available very quickly.
This is all governed, so everybody that comes to us has to get a data sharing agreement, has to have a data owner, and get approval. We’re starting to stop all of that rogue data use. You used to have to logon to a system and download a CSV and nobody knew about it. And then they present their reports, and you’d often get people who’d done different reports in the same meeting and having fights about which data was correct. Now when they’re talking about X, it’s absolutely X and there’s no debate about it.
MS: Has the data revealed any surprises about the move to hybrid learning?
KC: Now that every course that we offer is available online, the opportunities for contract cheating (where someone hires a third party to complete an assignment for them) have just grown exponentially. We are planning a machine learning proof of concept with support from Insight and Microsoft to start to identify that.
Also, we’re finding that the teachers who embrace online and teach leaning into the medium get better results than the ones who are just doing the same thing that they did in the classroom but now online. If you stood in front of the PowerPoint and talked about the PowerPoint in the classroom and you’re trying to just do that online, it’s not engaging.
There’s a bifurcation in the students. The really good students, they do all their work. They watch the videos. They do all the stuff that they need to do, and they do it all on time. The other students don’t do that and obviously benefit from being in the physical classroom.
We’re trying to work out how we can engage students and I think it’s changing the way that we teach. We will be teaching in hybrid mode for the foreseeable future so even when the domestic students come back on campus, the internationals probably won’t be here. That’s a big shift for us.
We want to identify what are the real issues that are impacting on student experience and student performance, and not mere correlates.
MS: How might you do that?
We’re going to be looking at Machine Learning (ML) DevOps. We want to make people’s lives better. We want to provide people with the information they need to know to do their jobs so that they don’t have to go looking for it. What’s coming next for us is moving into the world of AI ML bots and related technologies for learning analytics.
We’re onboarding the Moodle data for this proof of concept in the data lake. That will then enable us to then use ML techniques to then do some analytics.
MS: How do you manage access to the systems and to data?
KC: If you think about digital transformation, it is underpinned by two things – data and identity. We are in the process of moving our authentication to Azure Active Directory. That will give a seamless experience.
It’s also an important part for the cyber security piece. In the old days, you’d have your organisation and you’d put your firewalls around it. Everything happened behind the firewalls and it was all on premise and that was your perimeter. Now identity is pretty much your perimeter. Your students and staff are everywhere; your systems are accessible from anywhere – your perimeter is now identity and your data needs to be protected. So, it means that you need to improve your practices around encryption, and you need to have a solid identity platform and be able to adopt things like multi-factor identification, which are all on our road map.
MS: You’ve recently been named a Microsoft Regional Director for your work in cyber security. Congratulations.
KC: It’s a great honour to be a kind of a trusted advisor to Microsoft to provide a customer voice that’s independent and frank with a strategic focus.
I think the new leadership (Satya Nadella, CEO) has revitalised Microsoft. And really interesting, impressive products are coming out. I’ll be very interested to see what happens now with GitHub and Microsoft DevOps.
I think Microsoft Teams is the best product Microsoft has put out in the last 20 years. It has interesting possibilities for teaching. I can see the real opportunity for the Azure data platform connected to Teams and Office and the Knowledge Graph.
MS: With your cyber hat on – what would your best advice be?
KC: I tell people the same things every time. Use a password manager. Implement multi-factor authentication. Encrypt everything that you can. Increasingly, organisations are going to have an obligation to do privacy by design. How can they bake that into their software development processes and into their application processes?
I think there’s going to be a whole lot more people needing to get their head around cloud from a security perspective and increasingly, organisations will need to step up their cyber security game.
I’m responsible for making sure all of the data in the University is managed and secured appropriately. That’s every kind of data you can imagine from enterprise-type systems to downloaded satellite feeds to climate data. Understanding our data landscape and making sure it’s all secured effectively is what keeps me up at night. We’ve made progress on it, but there’ll always be something that surprises you.