That title was given to me, so I had plenty of wriggle room in preparing this talk! I was one of four speakers at the University’s ICT Graduate Programme one-day workshop on 28 August. I gave a fairly broad overview of what data anlaytic is, what drives it and what experience I have had in working with large data sets. But I think the part that had most impact was in a sense an anti-example, when I read from Ivor Goddard’s article in the June 2009 issue of Significance about the perils of combining data from many administrative sources and drawing inferences. I hope I gave the graduates food for thought towards the end of a busy day.
Raul Fernandez is a PhD student in the Faculty of ESTeM and on 21 August he presented this seminar to confirm his research. His topic looked really meaty, there’s probably two or three PhDs in there, so he can pick and choose! He’ll be analysing data from spectroscopes to try and identify pain based on the dynamics of blood flow measured by the machinery. Lots of data wrangling not will undoubtedly be involved, and lots of thinking about the time dependence in the data too.
Finally I’ve given a talk in Goulburn, at the 28th edition of the event on 19 August! I only prepared 29 slides but they were more than enough to keep the audience questions flowing for the full 75 minutes. I had to leave out a lot of the material on chronic fatigue syndrome, so I might have to offer to speak in more detail on that project again next year. The problems I did present, on the modelling of chlamydia pneumoniae and hepatitis, were well received, and I’ve got some interesting leads to follow up around cost functions in decision trees, the interaction between random forests and support vector machines, and new accuracy measures such as the area under ROC curves.
I’l report on my presentation at Goulburn 28 separately. The other three talks could be characterised by a solution to a specific problem that opens up further research questions. Robert Clark of the University of Wollongong spoke on “Modelling the relationship between leaf chemistry and koala population density”. The modelling included the very familiar logistic regression and linear model following a log transformation. Questions arose around ways in which the sixty-five observations could be made to go further, through further consideration of the variation in areas where data was collected.
John Newman of ten Australian Bureau of Statistics spoke on “Protecting aggregate business microdata at the Australian Bureau of Statistics”. His multiplicative model was straightforward and effective but raised issues around the uses to which the protected data could be put.
Esteban Munoz of the Universities of Hamburg and Canberra spoke on “Using spatial microsimulation for the estimation of heat demand.” He recently uploaded a library to R, implementing the GREGWT method of microsimulation. He’s using his model to estimate heat demand in German apartment buildings, and fielded a number of questions about possible extensions to his model to include random effects or a regularisation term.
The HRI is a big grouping of researchers across the University and this forum provided the opportunity for three of its members to showcase their current projects. Regan Ashby from Science presented work in progress on the relationship between exposure to sunlight and myopia, with the clear implication that time spent outdoors mattered much more than the amount of near work children do in terms of developing myopia. Lisa Scharoun from Graphic Design described a project she had worked on to design public spaces that weren’t just shopping malls for the elderly to use or even play in. Fanke Peng, also from graphic Design, talked about her work on wearable computers that enliven public spaces with memories as you walk through them.
Even at the rate of three presenters a week it’ll take several months to get through all HRI members and I look forward to many more such forums in weeks to come!
When a couple of statisticians start talking they realise how much statistics is going on around the campus in disconnected locations. So sixteen of the most active researchers in statistics and researchers with statistics from every faculty and research centre came together on August 6 for 10 minute talks. I spoke on multi-level modelling, channelling Gelman & Hill, Goldstein and Diggle, Liang & Zeger. Thanks also due to David Warton whose talk on 1980s approaches to ecological modelling gave me the idea about how to introduce my talk.
A few good hoary chestnuts were raised. Can you treat Likert scale responses? Is null hypothesis significance testing dead and buried? What is your favourite complex summary statistic? By the way, mine’s kurtosis, not in spite of but I think because of its demise being noted by Westfall in a 2014 American Statistician article.
It took three and a half hours for someone to mention Bayes … And it was Xavier in the context of plans to extend the analyses available in the AURIN portal.
Professor Jill Adler came from South Africa / London to present this talk on 25 June. Her project on teacher development encompassed everything from quality, results, capacity and leadership. She started with a statistic – 90% of maths ed research is done on 10% of the children, in the developed world. An important concept in teacher development that Jill used was about working on the lesson, not the teacher. She also talked about the five doctoral students who worked on the project and their contributions.
At the Maths Education symposium on 24 June I presented a subset of the lexical Say what you mean, mean what you say: lexically ambiguous words in statisticsambiguity projects I’ve been working on for the past couple of years. I spoke about the main parts of the lexical ambiguity project, focusing on the pre-post data on definitions, and the interventions. I ended up offering a free coffee to anyone who found the secret sentence hidden in the word find, which was claimed by mid-morning of the next day!
Cognitive style, spatial visualisation and problem solving performance
Ajay Ramful is working on the ARC project “Processing mathematics tasks” and his talk was on one of the aspects of that project. The C-OSIVQ questionnaire was used to assess cognitive style. The paper folding test was used to assess spatial visualisation. Tom Lowrie’s Mathematical Processing Instrument was used to assess problem solving. Correlations tended to be low in absolute terms, but with a sample size of over 700 students statistical significance was reached in a number of circumstances, indeed a correlation close to 0.5 in education research is apparently generally regarded as cause for celebration!
Facebook as a learning space
Sitti Patahuddin is also in the Maths Education Research Centre here at UC and she spoke about Facebook as a community of practice for teachers. Her study uses data from Indonesia. She used Statista, a statistics portal with data from 18000+ sources. She put up a post about dividing a whole number by a fraction and analysed the responses using NVivo.
Tracy Logan spoke about using data mining and secondary data analysis to achieve sustainable maths education research. She put up a diagram of the design and discussed much of the issues around, rather than specifics of packages and so on. The prospect of software to read microfiches of AMT data from he 80s was tantalising too – that’s my data they’ll be finding there!
Mathematical research at UC
Peter Vassiliou gave a personal perspective on research in mathematics. After some startling revelations about physicist Einstein and mathematician Minkowski he went on to talk about symmetry, geometry and physics – the three main planks of mathematical research at UC. He made. Good point about using maths doesn’t make you a mathematician – same with stats really.