Data Analysis for Engineers Training
This specialized training course, organized for the Saudi Council of Engineers, was designed to equip engineers with essential skills and knowledge in data analysis. Covering a wide range of topics from reading data, descriptive analysis, data analysis processes, identifying problem types, dealing with outliers and missing data, to data visualization, regression, simulation, time series, and reliability analysis. The training provided a comprehensive overview of the various aspects involved in data analysis for engineers.
Course Content
The course was structured into several key modules:
- Types of Data: Categorical (Ordinal, Nominal), Numerical (Discrete, Continuous – Interval, Ratio)
- Reading Data: CSV (Online, From desktop), Excel, Scraping (Dynamic pages – selenium, Static pages – rvest library), Financial Data (Quant mod), Databases
- Descriptive Analysis: Classic (Central Tendency – Mean, Variability – Standard Deviation), Resistant (Central Tendency – Median, Variability – IQR), Distribution (graphical – Frequency – Histogram, Probability – CDF, PDF, Skewness, Kurtosis)
- Data Analysis Processes: Google (Ask, Prepare, Process, Analyze, Share, Act), Research (Generation, Collection, Processing, Storage, Management, Analysis, Visualization Interpretation), SAS (Ask, Prepare, Explore, Model, Implement, Act, Evaluate), EMC (Discovery, Pre-processing data, Model planning, Model building, Communicate results, Operationalize), Big Data (Business case evaluation, Data identification, Data acquisition and filtering, Data extraction, Data validation and cleaning, Data aggregation and representation, Data analysis, Data visualization Utilization of analysis results), Project Based (Identifying the problem, Designing data requirements, Pre-processing data, Performing data analysis, Visualizing data)
- Problem Types: Making predictions, Categorizing things, Spotting something unusual, Identifying themes, Discovering connections, Finding patterns
- Outliers: Global outliers, Contextual, Collective
- Missing Data: Types (MCAR, MAR, MNAR)
- Correlation: Normal data (Pearson), Not Normal (Kendall’s tau, Spearman’s rho)
- Data Visualization: Chart Types, Misleading charts (Graph is accurate, but misleading – giving an irrelevant correlation, ignoring other variables, “cherry picking” data, graph is not properly labelled; Scale is distorted – scale does not start at zero, scale is enlarged, scale is too small, there is no scale or units are missing; Graph is not accurately drawn or distorts the information – pieces of the graph appear larger or smaller than they should (3D), percents don’t add up to 100, pieces in a chart are not in the correct ratios, graph is drawn upside down, units are not evenly or proportionally spaced), Chart Elements, Visual Elements
- Regression:
- Simulation:
- Time Series:
- Reliability Analysis: Parametric (Weibull – Given parameter, Given data), Non-parametric, Component Level (Usage – PM optimization, Predict Failure, Spare parts Optimization, Method – Parametric, Non-parametric – Kaplan-Meier analysis), System Level (Time Value for Money Analysis)
- Statistical Process Control (SPC): Process Capability Cp, Cpk, Process Performance Pp, Ppk, Control Chart (Western Electric rules)
- Opportunity Analysis: Porter’s Value Chain, P&L Analysis
- GIS: SF, Tmap, ggplot
Outcome
By the end of the training, participants were equipped with the knowledge and skills to effectively analyze data, make informed decisions, and optimize processes in their engineering practices. They were able to understand and deal with different types of data, perform descriptive analysis, navigate through various data analysis processes, identify and address different types of problems, deal with outliers and missing data, understand correlation, create effective data visualizations, perform regression analysis, run simulations, analyze time series data, conduct reliability analysis, apply statistical process control, perform opportunity analysis, and utilize GIS tools for spatial data analysis.