25 years as a data scientist and statistician — reviewing my favorite topics

Gerhard Svolba
21 min readJul 2, 2020

In July 1995 I started my career as researcher at the department for Medical Statistics at the Medical University of Vienna. Packed with knowledge from my studies I started a great journey into world of statistics, analytics, and data science. I am happy to say with complete conviction: This was the right decision for me!

The people, business questions, projects, conferences, customers, software packages, students, that accompanied me on that journey allowed me to learn and grow in that topic and enlarged my analytical and personal horizon a lot.

In this article I am going to look back on the 25 most interesting topics/events/analyses that I came across over the last 25 years. Listed in random order, and added on a daily basis, they tell a story of some of my highlights and give you an impression why I love my job as data scientist and statistician.

Table of Contents

  1. Simulating the Monopoly Board Game
  2. Writing the book “Applying Data Science — Business Case Studies Using SAS”
  3. Teaching statistics in primary school — My Dad has the most interesting job in the world. He is a data scientist
  4. Using mathematical optimization to generate business decisions
  5. Heading up the Analytic Community in the Austria-Germany-Switzerland region
  6. Sports Analytics — Supporting the Austrian Volleyball team with analyses for the European Championship 2019
  7. Demand Forecasting and Demand Planning in the retail and in the manufacturing industry
  8. Simulating the Development of Water Level at Lake Neusiedl
  9. Teaching the SAS Statistics/Machine Learning/Data Mining Education Trainings
  10. Applying Artifical Intelligence in a #data4good Project to protect our Environment
  11. Working as a Researcher in Medical Statistics and Biometry
  12. Yield Management in the Transportation Industry
  13. Data Science in Academics — Teaching at Universities and Business Schools
  14. Presenting Data Science Topics at large Analytic Conferences
  15. Writing the book “Data Preparation for Analytics Using SAS”
  16. Analyzing and Predicting Customer Behavior in the Telecommunications Sector
  17. Telling Statistical Stories
  18. Attending the SAS Global Technology and Product Manager Meetings
  19. Solving Statistical Problems and Performing Monte Carlo Simulations

1. Simulating the Monopoly Board Game

Back in primary school I loved to play “DKT” which is the Austrian equivalent to the Monopoly board game. So is not surprising that I used later on SAS Viya to simulate the processes in the board game to get a more detailed view on

  • the visit frequency on the fields of the game
  • the profitability distribution of the properties that you can buy

Over the last years I have presented this topic at many conferences, used it in my lectures at the business school of Burgenland and included a full case study into my book Applying Data Science-Business Case Studies Using SAS.

You can watch my webinar recording on this topic and review the slides.

2. Writing the book “Applying Data Science — Business Case Studies Using SAS”

Writing data science and analytic books has always been important to me. In my career I had the chance to work on so many interesting statistics and data science topics in customer and research projects. And books turned out to be a great way to provide this knowledge to our customers and the user community.

Applying Data Science — Business Case Studies Using SAS is my third SAS Press book and contains a collection of Data Science Case studies with business questions, results and SAS Code. The book has been the best selling SAS Press book at SAS Global Forum 2018 in Denver and I regularly give presentation and webinars on these topics.

You can view a recording of my data science webinar on this topic.

3. Teaching statistics in primary school — My Dad has the most interesting job in the world. He is a data scientist.

When my sons were in primary school I agreed with the teacher that I could teach their class for half a day on statistics. I always enjoyed this exercise and can highly recommend to replicate this idea if you have the chance to do so. Children are a very demanding audience and you should be prepared to explain in an appropriate language. However this endeavor is highly rewarding. I always benefited from their questions and could take away some learning for me as well.

Read my article that summarizes my experience.

4. Using mathematical optimization to generate business decisions

You can find this topic under many names: prescriptive analytics, optimization, operations research or decision support. I always enjoy these projects because the results are concrete business actions that help decision makes to better run their business, e.g. how you should adjust which quantities at which point in time for a specific product and geographic regions.

At the A2012 conference in Cologne I presented how we worked together with Europcar Austria who used SAS for the optimal re-location of their cars to adjust to changing seasonal demand pattern and for the fleet-in/fleet-out planing of their cars.

For RHI-Magnesita I programmed the deliver network optimization which recommended to decision makers which product to produce and ship from which plant in which quantity to optimally fulfill the expected demand. After I had formulated the problem, I was always amazed how short the final code in the SAS Optmodel Procedure was.

In times of fighting against COVID-19, colleagues at SAS have compiled great solutions for medical resource optimization. I like the example for venue optimization where we offer a solution that calculates an optimal seating plan for concert halls and stadiums according to distancing rules and other constraints.

5. Heading up the Analytic Community in the Austria-Germany-Switzerland region

In 2017 I took over the responsibility to run the analytic community in the Austria-Germany-Switzerland region. There are now 33 data scientists in this community and I feel honored to work together with them. They are all experts in various fields, data mining, machine learning, deep learning, computer vision, statistics, forecasting, econometric modeling, optimization, text analytics, visualization, decision management, DevOps, model management and many more. Working together with my colleagues has widely extended my horizon and my knowledge in statistics and data science.

The strength of the community is also its variety. Some of us are in their 20s, others in their 30s, 40s, and 50s. This also brings a different points of view on how to tackle analytical problems. Our favorite programming and tool portfolio ranges from programming in Python in the Jupyter Notebook to obviously SAS Code and SAS Procedures, to the modeling pipeline interface in the visual interface and also includes creating and using self-service analytic dashboards. I enjoy working together with our young data scientists and learn how they view and solve their analytical tasks and I am happy to advise with my experience how to design analytic solutions for our customers.

All together we have a solid 3-digit-number of years of experience in analytical projects in various business domains: like risk management, fraud detection, demand planning, campaign management and customer analytics, production and quality analytics, medical research, and many more.

The diversity in business domains and analytical disciplines is definitely one of the most important features that makes my job so exiting.

6. Sports Analytics — Supporting the Austrian Volleyball team with analyses for the European Championship 2019

In 2019 we supported the Austrian Volleyball team with analyses and self-service analytics dashboards for their preparation for the European Championship 2019. The aim of our analysis was to detect pattern in player behavior of competitor teams, e.g. finding sequences and repetitions in their serving behavior. We also analyzed relationships in the behavior of the Austrian players:

  • The combination of which factors most often result in a point?
  • Under which circumstances do certain players act most efficiently?

I had the chance to work together with the head-coach Michael Warm. The conversations with him also gave me additional insight, what it means to apply analytics results in a well established process.

“If I interrupt the game and chance the tactics based on data science results, I need to be really sure that this action is successful. After some false decisions the team will lose confidence in my advice”.

It is a big difference if you just build the model and communicate its misclassification rate or if you are the one who stands at the sidelines and interferes with the game. The same is true for all who use our analytic results in fraud detection, risk management, customer analytics or demand forecasting. They can only defend the models, if they have enough trust and enough insight into the results.

https://computerwelt.at/news/volleyball-nationalteam-setzt-auf-big-data-analytics/

At the end of the summer 2019 I had the chance to visit head-coach Michael Warm and his team in their training camp. I was totally impressed by the commitment and the passion of the team members. It motivated me a lot to be among such hard working professionals and I was able to take this momentum with me into work life as data scientist.

7. Demand Forecasting and Demand Planning in the retail and in the manufacturing industry

From 2005 to 2009 I had the chance to work together with Swarovski AG for the demand planning in their consumer goods business. We analyzed the demand pattern for their fashion and for their home decor products and used SAS forecasting to automatically create and update demand forecasting models for their > 1000 articles.

For many of Swarovski’s articles the creation of forecast models is quite a challenging task. This is due to the fact that many articles do not have a demand history of more than 12 or 24 months. The demand planning department also needed forecasts for articles that had even not yet been launched at the market. Consequently we could not use classical time series techniques for all articles. We had to apply new product forecasting techniques where we built predictive models that predicted the demand based on product features.

Also we created similarity search models that allows to identify a subset of models that were most similar to the target product and derived the demand pattern from these reference models. Thus we were able to generate demand forecasts for product which were even not yet launched but only designed in the labs.

It has been a great experience to see how demand planners use our models for the rolling forecasts and also for the budget planning. This project was not only interesting from an analytical perspective. It was also a lot of work to pull all the data from their historic sales and demand data together to be able to feed the models and to periodically create the forecasts. It was great to see our customer Swarovski presenting at the SAS Analytics conference in Cologne in 2012.

8. Simulating the Water Level at Lake Neusiedl

In May 2020 I combined a my favorite sports “sailing” with my passion for data science. As the water level a lake Neusiedl has been very low in May, due to a very dry start of the year 2020 from January — May, I ran simulation scenarios to see how the water level might look in the summer months, June, July, August, and September.

I create a simple “concatenation model” and simulated what might happen, if the year 2020 starting with June 1st, continues as the previous years, 2019, 2018, 2017.

I also created a regression model with 2 parameters “Number of Hot Days” and “Rain in mm” using SAS Viya. In order to visualize the different outcomes for different parameter settings I created a simulation dashboard with sliders that allow to adjust to different assumptions.

This work has been published as a Youtube Webinar. It has been an great experience to provide this link also to my friends (who mostly only know me from sailing and not as a data scientist) and other organizations that are involved into the topic “sailing” and “water level” and see their feedback.

The video and the work received some press coverage here in our area and people form different areas approached me to discuss the analysis in more detail.

9. Teaching the SAS Statistics/Machine Learning/Data Mining Education Trainings

Before I joined SAS as a data science consultant and analytic solution architect, I worked at the Medical University of Vienna as a researcher and teacher. I always loved to teach statistical concepts as well as software features. Consequently I also taught selected statistics, machine learning and data mining classes in my role as SAS consultant. Here is a list of selected courses that I held:

Applied Analytics Using SAS® Enterprise Miner, Advanced Predictive Modeling Using SAS® Enterprise Miner, Multivariate Statistical Methods: Practical Research Applications, Predictive Modeling Using Logistic Regression, Categorical Data Analysis Using Logistic Regression, Applied Clustering Techniques, Development of Credit Scoring Applications Using SAS® Enterprise Miner(TM), Data Preparation for Data Mining, Extending SAS Enterprise Miner with User Written Nodes, Neural Network Modeling, Decision Tree Modeling, Forecasting Using SAS Software: A programming approach Forecasting using SAS Forecast Server Software, Introduction to the programming with SAS/IML Software, Fitting Poisson Regression Models Using the GENMOD Procedure, Managing SAS® Analytical Models Using SAS® Model Manager

All these engagements extended my statistical horizon. Not only by getting into the details of the course materials, but especially by interacting with my students and answering their questions. I still like to present and explain statistical methods and the possibilities how to solve this in SAS in webinars. For example how you can calculate and display individual validation limits with SAS Visual Analytics.

SAS offers a great collection of statistics, machine learning and data mining classes. In these classes, students not only learn about the software options in SAS, they also see concrete business examples and how to apply and select the respective analytical methods. Check out the large offering of SAS education classes, both in classroom and e-learnings.

In 2008 I also developed a training class for SAS: “Building Analytic Data Marts” and ran this course from 2008 to 2018. The content is taken from my two SAS Press Books: Data Preparation for Analytics Using SAS and Data Quality for Analytics Using SAS.

10. Applying Artificial Intelligence in a #data4good Project to protect our Environment

In 2019 and 2020 I had the chance to work together with researchers from the International Institute for Applied System Analysis (IIASA) and with AI-experts from our headquarters in Cary on a computer vision project to classify the deforestation status in the amazon rainforest.

In the project a computer vision model was built that allows to classify satellite pictures for the existence of human intervention in the amazon rainforest. Of course I love to work on my business projects, however it has been a great experience to apply artificial intelligence and the capabilities of SAS Viya to help protecting our planet.

We also started a crowd souring initiative in order to increase the set of label images to be able to fine tune the model. You can help us and label some satellite pictures and you can review the progress of the crowd sourcing work here.

We are currently discussing additional topics where we can work together with IIASA and where we can apply machine for important topics.

11. Working as a Researcher in Medical Statistics and Biometry

Before I joined SAS, I worked as a researcher at the department for Medical Statistics at the Medical University of Vienna from 1995–1999. In that role I did a lot of statistical consulting for medical doctors for their research studies and applied many of analyses for clinical trials, pharmaceutical studies and other biometric topics.

It might sound funny, but I learned statistics during that time. Of course I brought a lot of knowledge with me form my masters studies. However working in biometry and medical statistics educated me in a thorough way, what it means to apply analytical methods. I still benefit a lot from the many dimensions:

  • dealing with high variability, small sample sizes and sparse data
  • handling data preparation and data quality issues
  • having to carefully select the appropriate statistical method based on the business questions and the nature of the data
  • being forced to be extremely accurate as the outcome directly affects our health
  • having to explain statistics to people who are experts in their field, however only have little knowledge in quantitative analysis

When I review many of the #data4good projects that we see these days, I can say: I performed a lot of real #data4good in my career as researcher and I am proud of that.

I never completely left this area. In my SAS career I had the chance to work in a few clincial trial projects. From 2011 to 2014 I was part of the board of the Vienna Biometric Society and I still teach a methodological seminar for the medical students and frequently exchange with my former colleagues from the Medical University. It is a great experience to work discuss with them. They all have highly responsible jobs now and head up departments or workgroups in the academic area or work together with the EMA (European Medicine Agency) which controls the admission of new pharmaceutical products.

12. Yield Management in the Transportation Industry

Many business questions that I had to analyze in my data science projects are centered around the analysis subject “CUSTOMER”. Consequently most of my analysis table had the customer, the contract or the customer transaction as the main analysis entity. There are however many interesting analyses that are not focused on the “customer” entity and I always enjoyed working on them. I already mentioned demand forecasting for Swarosvki or the transportation optimization for Europcar. Here is another example from the transportation industry. We performed Yield Management for a railroad organisation and analyzed together with them how to optimize tariffs and promotions.

These projects are challenging from many different angles:

  • You usually deal with big data, when analyzing detail sales data. And there are cases where you cannot just aggregated the data but you have to provide selected derived variables already on the detail data.
  • Data Quality varies across historic period, IT systems and also regions. Some data are not available to 100 %. And you have to make some educated guesses how your sample is biased and how you correct your data. Analytics can help here to provide meaningful imputation values across regions, seasons, segments, and data collectors.
  • Visualization is key. Over the last years the visualization capabilities have strongly improved, especially when it comes to interactive charts, big data visualizations, geo maps and the considerations of the time axis. Some of my projects started much earlier so we had to be flexible with the visualizations. In our project we enjoyed the flexibility of SAS/GRAPH and the capabilities to create individual geo map representations a lot. In this example your see that we mapped the course of the rail service and provided branches for the smaller local railways where we only had aggregated data.

Our project brought data-driven answers for two very important business questions.

  • Which travel relations (routes) are highly competitive and where can we quote a higher price as there is high demand?
  • Which segments between individual stations have high passengers numbers, a strong seasonal variation, or varying booking pattern?

13. Data Science in Academics — Teaching at Universities and Business Schools

I always enjoyed teaching students. During my time as a lecturer at the Medical University I taught the statistics classes and a class “Statistics and Documentation” for the dietary assistants. This has been a great experience as I learnt very early how to explain statistics and the application of complex methods.

In my SAS career I continued teaching as a part time lecturer. For the medicine students I teach the methodological seminar, where they learn how to apply statistical methods for their diploma thesis. At the University of Applied Science Upper Austria I teach data science for the Global Sales and Marketing Master students. At the University of Applied Sciences Burgenland I teach Visual Analytics and Data Science. And I taught Data Science Case Studies Using SAS in the summer term in the last two years at University of Vienna.

I always try to let my students feel and touch statistics as directly as possible. We create for example a living scatter plot and we discuss the features of a lift-chart based on the ordering of the students in a line.

There is a lot of payback through this teaching exercises:

  • I get feedback from young students and learn, how they view business problems, how they would approach them, what questions they have, what their expectations on data science are .
  • And I continuously improve my “explain statistics” capabilities as students and lectures are the best platform to extend your teaching vocabulary. Many of the examples that I use in my presentations on marketing events, originate from a discussion with students in my class room.

14. Presenting Data Science Topics at large Analytic Conferences

In my career I had the chance to present statistics and data science topics at large analytics conferences in different locations. This has always been a great experience for me, because

  • It is highly rewarding to present your work and your analysis ideas to a large audience and experience their interest.
  • I always loved to share my work. Consequently I always made my slides and the content available to others that they can benefit from it.
  • You receive important feedback. Presenting on conference allows you to expose yourself to the critical discussion of other data scientists. This allows you grow, get new inputs and ideas and improve your work over the years.
  • It also teaches you, how you can best present complex analytic topics in order to make it interesting and consumable to audience with a high diversity.

Here are some of my favorite presentations:

15. Writing the book “Data Preparation for Analytics Using SAS”

In spring 2003, when I walked to the office of a customer to work on data mining project, I had an important idea, that changed my work life. I realized that it might be a good idea to summarize my project findings, experience, code examples, feature generation ideas into a book. In 2004 I started working on the book “Data Preparation for Analytics Using SAS”.

Would I do it again? YES!!! It has been a great experience to summarize the content and to compile it to a book. The book has been published in SAS Press in 2007. There has been great response from SAS Users to the book. I gave numerous presentations at conferences in Austria, Germany, rest of Europe and the US. And I also developed a course offering for SAS based on the book.

A lot of time and effort went into the writing of the book, when I structured my experience on the business background, analytics considerations, aggregations, transpositions, coding tips, case studies, feature engineering and more. However I always felt that is was worthwhile to invest it. When I gave my first presentation on “Data Preparation for Analytics” at SAS Global Forum and saw that the room was packed with 100s of attendees, I knew that I was right. It has been extremely rewarding over the years to get feedback from customers on the book content and discuss with them.

Would I do it again? DEFINITELY! Maybe I would name the book rather “Data Preparation for Data Science” these days ;-)

16. Analyzing and Predicting Customer Behavior in the Telecommunications Sector

It was back in July 1999, in the 2nd week after I started at SAS. The consulting manager approached me and assigned me as the lead statistician for the Churn Prediction project at a large Austrian mobile phone provider. I felt honored to be selected and assigned to that project. But I also had some respect as this was definitely (one of) the largest data mining project in customer analytics in Austria at that time and I came directly from medical statistics and did not have real project experience in marketing and customer analytics.

However it turned out to be a great combination: Gerhard + Telecommunications + Customer Behavior Analysis. For more than 10 years I worked and managed a lot of consulting projects for Austrian telecommunication customers. And SAS and our customers were very successful with these projects:

  • From the first day on in consulting projects in the telecommunications sector I highly appreciated the openness, creativity and flexibility of people working in this industry. When I worked in the medical statistics area at the University all major decision where made by people with a large number of working years. In the telco projects most of us where around their 30s and we made important decisions about campaign plans, analysis strategies, and IT-implementations. For me this environment was quite a game changers and I replicated this open and collaborative attitude from the telecommunications industry in many of my other SAS projects.
  • We ran projects and analyses in different areas: churn prediction (of course ;-) ), customer segmentation, response profiling and analyses to fine tune campaign offerings. Market basket analyses, honeymoon-analyses (how do customers change their product and service usage after their first 3 months), network analyses, customer life-time values and many others.
  • A lot of these real-world experiences in preparing and quality checking for analytics were one of the triggers why I decided to write my first two books for SAS Press “Data Preparation for Analytics Using SAS” and “Data Quality for Analytics Using SAS”. I gained so much experiences in data mining and machine learning in these projects and I wanted to give them back to the SAS User Community.

17. Telling Statistical Stories

In (3) and (9) I already explained that I like to teach and to explain statistical topics. In my role as data science consultant at SAS I am very often in the situation where I have to explain complicated topics to our customer or event attendees. These explanations have to be in wording that the non-statistical listener can understand and relate it to his business environment.

This is important for us, as we want to get the approval for the implementation of projects, sell our software or receive the acceptance that we move further with our project ideas.

I enjoy explaining these topics to our customers and to my audience at analytics conferences. Below I have listed a selection of Blogs that I wrote in that context.

You can visit my LinkedIn Profile or the SAS-Blogs for more content. I would also like to thank my colleagues from SAS marketing for supporting and encouraging me in writing my blogs. Without their support in proof-reading, layouting, publishing and promoting my blogs my ideas would not shine in the same way as they do now.

18. Attending the SAS Global Technology and Product Manager Meetings

A month after I started at SAS I had the chance to travel to Heidelberg to our European headquarters for the first time. This was quite a game changer for future years in my career. I was lucky to experience an environment of open minded people, who discussed their project experiences, customer feedback, analytic trends, product development and many other topics.

These conversation were always a big boost in understanding and extending my horizon by listening to the other countries and regions and their customer success stories. We support each other in customer meetings and make sure that our customers receive information and updates that are most relevant for them.

Over the years I had the chance to travel to Heidelberg, Cary NC, Rome, Amsterdam, Copenhagen, and many other locations frequently to meet my colleagues. It it still a big motivation for my work to be together with them in a room. It is a gift to work as part of the world-class statisticians and data scientists.

19. Solving Statistical Problems and Performing Monte Carlo Simulations

--

--

Gerhard Svolba

Applying data science and machine learning methods-Generating relevant findings to better understand business processes