Home alone? Invest now in your data science skills, and help solve the problems of the moment and the future

Gerhard Svolba
12 min readMar 22, 2020

Proving a reference value that considers all available co-information (June 18th)

Are you also tired of getting this feedback from business people when presenting your data science results:

  • Yes, but in this month we often see higher values …
  • Yes, but in this segment there are much older customers …

Especially when you highlight outliers, anomalies or suspicious observations that are quite distant form the expected value, this is often the first response.

Different from simple one-size-fits-all average values, analytic methods allow to calculate individual reference values that already consider seasonal pattern, trends, and other co-variables of your analysis subjects.

Consequently the highlighted observations are not selected because of its absolute value but based on its deviation from their individually expected value.

Watch my Youtube Video

Water Level at Lake Neusiedl is critically low! — Getting insight with a what-if-analysis and investigate scenarios (June 4th and 11th)

My webinar video this week shows how I used a simple concatenation model and a linear regression model to run simulation scenarios for the water level at lake Neusiedl in June, July, August and September this year.

The concatenation model simply connects the time series of the actual year until end of May with the time series of the last 20 years per June. It answers the questions “What happens if the water level from now on develops as in year 2019, 2018, …”.

I also built a simple regression model based on two variables, monthly rain sum and number of days > 25°C per month. This model is used in SAS Visual Analytics where you can use slides to specify values for “rain sum” and “days >25°C” for the next months and immediately see the outcome.

Watch the Youtube recording here in English.

Youtube Video in German

Detecting Structural Changes in Longitudinal Data (May 29th)

This week is was invited by Allan Bowe from sasensei to one of the SID meetings to present about detecting structural change. It has been great to present and discuss with fellow SAS programmers and analytic people during the session and on social media afterwards.

You can find the recording of the session on Youtube

This session also relates to content that I previously published in my home alone data science webinar:

Automatically highlight data-driven events with reference lines in line-charts

Simulate timeseries data with a SAS DATA Step and SAS Functions

3 ways to consider movable holidays in SAS

https://support.sas.com/documentation/onlinedoc/ets/132/ucm.pdf

Home Alone Data Science Webinar Series by Gerhard

Detecting Structural Changes and Outliers in Longitudinal Data

https://github.com/gerhard1050/Applying-Data-Science-Using-SAS

•#115 at https://github.com/gerhard1050/DataScience-Presentations-By-Gerhard

Self Service Analytics made easy with SAS Viya

In the week before our local (virtual) SAS Forum here in the Germany/Austria/Switzerland region I am publishing a software demo + presentation video in German.

No worries you can still watch it as the content can be easily followed from the demo screens.

  • Get to know your analysis data
  • Perform simple profiling to understand univariate relationships in the data
  • Create a first simple predictive model
  • Refine your predictive model
  • Perform interactive scenario analysis for different cutoff points.

Guest contribution on Computer Vision from my SAS Colleague Michael Gorkow on May 17th:

This week I want to feature the great work on computer vision from my SAS colleague Michael Gorkow. He applied deep learning methods for computer vision and demonstrated 3 uses cases:

Aircraft Turnaround Management with SAS Event Stream Processing. Here I especially like the fact that his work allows to automatically collect data that we need as features in machine learning models and time series forecasting. How long does the unloading take place? How many minutes were spent of re-fueling?

Tracking Social Distancing on Camera Videos users computer vision to automatically calculate measure for the distance of humans in the picture. This is not just another computer vision gimmick! It allows to monitor locations over time and automatically generates measures that indicate at which places and at what time distancing rules are not observed.

The example “Investigate Images from Social Media to detect Weapons” shows how SAS Visual Analytics can serve as an interaction interface to detect regional hotspots where certain posts of pictures on Facebook, WhatsApp and WeChat contain undesired content.

Also read Michaels articles at Medium:

New content on May 7th: Can events and changes in the course over time be automatically in your data?

Youtube Video — Detecting Structural Changes and Outliers in Longitudinal Data

This case study shows how analytical methods can be used to automatically detect events and changes in the course of longitudinal data. Example time series data with the number of airline passengers and data from a long-term clinical trial are used to illustrate how data can be smoothed and breakpoints and outliers can be detected.

Analytical methods like smoothing of longitudinal data, multivariate adaptive regression splines, automatic breakpoint detection, automatic detection of outliers, and ARIMA models are applied to answer this business question. Events in the line graphs are automatically highlighted using references lines.

Read also the related article on SAS Communities that explains how to automatically add data-driven reference lines in line charts.

New content on April 30th:Use Data Science Methods to check the Alignment of your processes with Predefined Pattern

Youtube Video — Which of your customers show a behavior that is far from what you expected? You would like to receive a ranked list of those analysis subjects with the highest deviation from “normal” or “expected”? — Data Science methods can provide this to you. Learn how analytical methods like the Chi2-test allow to quantify the deviation between the assumed and the actual distribution. You experience how analytics helps you to move from “noise” to manageable segments

New content on April 24th: Listening to Your Data! Discover Relationships with Unsupervised Analysis Methods

Men to not drive sports cars? Really? Can your data tell you stories about your analysis subjects, even if you don’t ask explicitly? Youtube Video

This case study shows how you can receive answers from your data, even if you do not ask every question in detail. You see which features and properties in the data are closely related together. Learn how unsupervised machine learning methods like association analysis and variable clustering are used generate rules and uncover relationships between the features of your analysis subjects.

Article on SAS Communities how you can Use SAS Enterprise Miner for Predictive Modeling Simulation Case Studies

New content on April 17th: How long will Gerhard still stay with our company?

Youtube Video featuring the application of survival analysis methods for employee retention.

Recently, an increasing number of employees quit their job! Thus, the general manager of the company is interested to get a clearer picture about the average retention period of the employees and potential influential factors on the length of the retention period. But: Can assumptions about the average length of time intervals be made, even if most of the endpoints have not yet been observed (because employees are still with the company)?

Learn how survival analysis methods like Kaplan-Meier estimates and Cox Proportional Hazards regression answer business questions like:

  • What is the average retention period for employees in the company?
  • How can the retention period be visualized and compared between different groups?
  • Are there influential factors for the length of the retention period?
  • Can the expected survival period for an employee be predicted?

Article on SAS Communities how you can simulate time series data that contain certain features: Simulate timeseries data with a SAS DATA Step and SAS Functions

April 9th: New content added this week!

3 ways to consider movable holidays in feature engineering

Learn, why for my friend George, his mother-in-law is a nightmare: Article at SAS Communities: 3 ways to consider movable holidays in SAS

Youtube: Getting More Insight into Your Forecast Errors using Multivariate Statistics

  • Is it sufficient just to monitor the quality of your forecast models over time or is it better to identify the drivers for large forecast errors?
  • Do demand planners really improve forecast accuracy with their manual overwrites?

The Video Getting More Insight into Your Forecast Errors using Multivariate Statistics and as a SAS Global Forum Paper answer these two questions

Using a real-life case study, this paper answers these questions. It shows how you can study the impact of factors like product group, forecast horizons, seasonality, or the forecast model type on forecast accuracy and convert them into actionable results. You learn how univariate methods provide first insights into the structure and relationships of your forecast data.

You gain insight into how manual overwrites of the statistical forecast change forecast accuracy in both directions and how you use analytical and graphical methods to illustrate these findings. You see how multivariate analytical methods like linear and quantile regression provide additional relevant insight.

Main Article:

It is easy to forget in moments of crisis that there are a number of tools at our disposal to help us solve problems. Analytics in particular is a versatile tool that can, and is, being used to help us understand the world around us, and manage problems.

Over the last years I have given many data science and machine learning papers at numerous conferences. Assuming that many of you currently work from the home office, I want to give you the chance to consume this content via webinars. Therefore I am opening the pool of my data science conference presentations and (re-)present them in a virtual way. The sessions deal with relevant data science applications and experiences. So you don’t have to miss the usual data science conversations that you usually have with your peers in the office.

100% COVID free! At the moment there is a big focus in blogs and articles on the analysis and prediction of the effect and the course of the COVID virus. My webinar will not focus the COVID topic. The aim of my webinar is to share information about advanced techniques that we have found useful in previous projects, including very recent work. Each of them is a stand-alone topic, so there is no need to look at them all, or in any particular order.

Let me know by commenting below if there are particular tools, techniques or ideas that you want covered. I will try to source and prepare material on those and we will publish it as soon as we can.

Schedule and Content

The first session of the webinar is scheduled to be published on Thursday, March 26th. Over the next days the schedule and download links will be published here. Stay tuned!

Monte-Carlo Simulations to better understand the outcome distribution

Available on Youtube: Using Monte Carlo Simulations to Understand the Outcome Distribution — “The Sales Managers’ Problem”

When the sales manager looks at the project pipeline, does the sum of weighted averages give him or her a full picture? Or would the sales manager rather like to see the distribution of the possible outcomes to get a more informed picture?

Communicating Analytical Results and Interpreting Machine Learning Models with SAS Viya (March 26th)

Available on the DACH Youtube Channel: Communicating Analytical Results and Interpreting Machine Learning Models with SAS Viya

The tricks and tips include:

  1. Performing interactive cutoff analysis to visualize the outcome of predictive models
  2. Use a decision tree to „explain“ complex machine learning models
  3. Turn on the model interpretability charts like partial dependence plots, ICE plot and LIME charts
  4. Quantify the importance of explanatory variables in € and $
  5. Display the (hidden) regression coefficient of categorical input variables

Future Topics

Here is a selection of content that will be covered in my webinars over the next weeks:

Applying Data Science — Ten Things Advanced Analytics and Data Science can do for your business

Abstract: Will the sales manager keep his job, when we look at his sales pipeline? Can your data tell you stories even if you do not ask all questions explicitly? Can you automatically detect outliers and breakpoints over time that might indicate fraudulent behavior? Can you deal with missing endpoints and predict the expected retention time of your employees? This presentations shows how you can use data science methods to leverage the huge amounts of data you collect about your customers or you processes. You see practical examples about how machine learning techniques uncover facts that help you to make better business decisions. The examples are taken from the book “Applying Data Science: Business Case Studies Using SAS”.

Title: Want an Early Picture of the Data Quality Status of Your Analysis Data? — SAS® Visual Analytics Shows You How

Abstract: When you are analyzing your data and building your models, you often find out that the data cannot be used in the intended way. Systematic patterns, incomplete data, and inconsistencies from a business point of view are often the reason. You wish you could get a complete picture of the quality status of your data much earlier in the analytic lifecycle. SAS® analytics tools like SAS® Visual Analytics help you to profile and visualize the quality status of your data in an easy and powerful way. In this session, you learn advanced methods for analytic data quality profiling. You see case studies based on real-life data, where we look at time series data from a bird’s-eye view and interactively profile GPS trackpoint data from a sail race.

Additional Content for SAS Users and SAS Programmers

Over the next weeks I am going to publish one article per week on SAS Communities. I have started this publication series on March, 18th. The plan is to publish an article every Wednesday.

March, 18th: Encoding of CLASS Variables in Regression Analysis — Better understand the ORDINAL encoding

March, 26th: March, 26th: Display the hidden estimate for the reference category in EFFECT coding for better interpretability

April, 2nd: %CALC_REFERENCE_CATEGORY displays the “hidden” coefficient in EFFECT encoding for CLASS variables

--

--

Gerhard Svolba

Applying data science and machine learning methods-Generating relevant findings to better understand business processes