Introduction

Welcome to this beginner-friendly workshop designed especially for medical students and healthcare professionals. Over three immersive days, you’ll learn Python from the ground up — not just as a programming language, but as a practical tool for analyzing clinical and research data.


Workshop Overview

Python has become one of the most widely used languages in healthcare research and medical informatics. From analyzing patient data and visualizing disease trends to supporting AI-based diagnostics, Python is transforming how health professionals handle data.

This workshop provides a hands-on, structured approach to learning Python fundamentals and applying them to real-world medical and research datasets.

Duration & Format
  • Duration: 3-day intensive workshop
  • Format: Hands-on learning with practical exercises
  • Schedule: October 2025
  • Delivery: Interactive sessions with live coding demonstrations

Learning Objectives

By the end of this workshop, participants will be able to:

Technical Skills
  • Handle Data Effectively: Organize and manage patient or laboratory data using Python’s built-in structures
  • Perform Data Analysis: Use pandas to explore health-related datasets and compute descriptive statistics
  • Create Visualizations: Generate charts for patient outcomes, clinical trials, or epidemiological trends
  • Perform Statistical Tests: Apply tests such as t-test, ANOVA, chi-square, and logistic regression tests to biomedical datasets
  • Build a Foundation: Prepare for advanced applications like biostatistics, machine learning, and clinical decision support
Practical Applications
  • Read and process data from various file formats
  • Filter and query datasets to extract meaningful insights
  • Perform statistical tests (t-tests, ANOVA, chi-square, logistic regression)
  • Create professional data visualizations
  • Build a foundation for advanced Python applications

Target Audience

This workshop is specially designed for DMSFI students and healthcare professionals who are ready to explore how programming and data can enhance medical decision-making, research, and innovation.

Primary Audience
  • Students in the Davao Medical School Foundation, Inc.
  • Complete beginners with no prior programming experience
  • Professionals looking to add Python skills to their toolkit
  • Researchers who need to analyze data programmatically
Prerequisites
  • No programming experience required - we start from the very basics
  • Basic computer literacy (file management, using applications)
  • Willingness to learn and practice hands-on coding
  • A laptop with internet connection for installation and practice
Who Will Benefit Most
  • Medical professionals aiming to integrate data science into clinical practice or research
  • Medical students preparing for coursework or electives involving biostatistics, informatics, or computational tools
  • Clinicians and researchers seeking to enhance decision-making through data-driven approaches
  • Healthcare professionals curious about programming, data analysis, and their applications in medicine

Expected Outcomes

Immediate Skills (End of Workshop)
  • Confidently write basic to intermediate Python programs
  • Understand and apply fundamental programming concepts
  • Perform basic data manipulation and analysis tasks
  • Create simple visualizations from datasets
Long-term Benefits: Clinical & Research Impact
  • Data-Driven Insight: Confidently explore and interpret clinical or survey data
  • Research Independence: Perform your own data cleaning and analysis for publications
  • Problem-Solving: Develop computational thinking skills
  • Foundation Building: Prepared for advanced topics like machine learning
  • Automation in Practice: Streamline repetitive workflows, such as summarizing lab data or generating reports
Practical Projects

By workshop completion, you’ll have created:

  • A personal library of useful Python functions
  • Mini-projects analyzing sample clinical datasets (e.g., patient demographics, vital signs, or lab results)
  • Solutions to data-driven questions often encountered in healthcare and research
  • A foundation for continued learning and development

Workshop Structure

Day 1: Foundation Building
  • Python basics and development environment setup
  • Variables, data types, and basic operations
  • Introduction to programming logic
Day 2: Building Complexity
  • Control structures and functions
  • Working with modules and handling errors
  • Hands-on practice with increasingly complex problems
Day 3: Real-World Application
  • Data analysis with pandas
  • Statistical analysis and visualization
  • Capstone project and wrap-up

What You’ll Take Away

Skills Portfolio
  • Comprehensive understanding of Python fundamentals
  • Experience with industry-standard data analysis tools
  • Portfolio of completed programming projects
  • Confidence to tackle new programming challenges
Resources for Continued Learning
  • Complete workshop materials and code examples
  • Recommended resources for further study
  • Clear pathways for advancing your Python skills
Certificate of Completion

Participants who complete all workshop activities will receive a Certificate of Completion in “Data Analysis in Python for Complete Beginners,” issued by the Center for Research and Development, DMSFI.


Definition of Terms

I. Programming and Data Basics
  • Variable – A named container used to store data values in a program.
  • Data Type – The classification of data that tells Python what kind of value is being stored (e.g., integer, float, string, boolean).
  • List – An ordered, mutable collection of items (e.g., [10, 20, 30]).
  • Dictionary – A collection of key-value pairs (e.g., {'name': 'John', 'age': 30}).
  • Function – A reusable block of code that performs a specific task.
  • Module / Library – A collection of pre-written Python functions and tools (e.g., numpy, pandas, matplotlib).
  • DataFrame – A two-dimensional, labeled data structure in pandas similar to an Excel spreadsheet, with rows and columns.
  • Series – A one-dimensional labeled array (e.g., a single column of a DataFrame).
  • Control Structures – Logical structures that determine the order in which code executes, such as loops (for, while) and conditionals (if, else).
II. Data Querying and Selection
  • Querying – Extracting specific subsets of data based on certain criteria or conditions.
  • Filtering – Selecting rows or columns in a dataset that meet a particular condition (e.g., heart rate greater than 100 BPM).
  • Indexing – Accessing data using row or column labels or positions.
  • iloc – Index-based selection (integer position).
  • loc – Label-based selection (using column or row names).
  • Boolean Masking – Using logical conditions (True/False) to filter data.
  • isin() Function – Filters rows where a column’s value matches any in a given list.
  • String Operations – Text-based filters or transformations (e.g., str.contains() to find names with a certain substring).
  • query() Method – A more readable way to filter data using string expressions (e.g., df.query('age > 30 and heart_rate > 100')).
III. Data Transformation and Aggregation
  • Grouping (groupby) – Combining rows with the same value in one or more columns to perform summary calculations.
  • Aggregation – Calculating summary statistics such as mean, median, count, or standard deviation for each group.
  • Custom Aggregation – User-defined summary functions applied to groups (e.g., calculating body mass index).
  • Sorting – Arranging rows in ascending or descending order based on column values.
  • Chained Operations – Combining multiple DataFrame methods in a single line (e.g., filtering → sorting → selecting).
IV. Data Visualization
  • Matplotlib – A powerful Python library for creating static, 2D plots like histograms, scatter plots, and bar charts.
  • Seaborn – A visualization library built on Matplotlib that simplifies statistical plotting with cleaner aesthetics.
  • Figure – The entire plotting area that may contain one or more subplots.
  • Axes / Subplot – Individual plots within a figure.
  • Histogram – Displays the frequency distribution of a numeric variable.
  • Box Plot – Visualizes data spread and detects outliers using quartiles.
  • Scatter Plot – Shows relationships or correlations between two numeric variables.
  • Bar Plot – Represents categorical data using rectangular bars.
  • Violin Plot – Combines box plot and density plot to show data distribution.
  • Heatmap – Displays a color-coded correlation matrix or table of values.
  • Pie Chart – Represents proportions of categories in a circular chart.
  • Legend – A key that identifies what each color or symbol represents in a plot.
  • Regression Line – A line that shows the general trend in scatter plot data (e.g., using sns.regplot).
V. Descriptive and Inferential Statistics
  • Statistics – The branch of mathematics dealing with the collection, analysis, interpretation, and presentation of data.
  • Descriptive Statistics – Summarize data features without drawing conclusions beyond the data itself.
  • Inferential Statistics – Use sample data to make generalizations about a larger population.

🔹 Descriptive Measures

  • Mean – The arithmetic average.
  • Median – The middle value when data are sorted.
  • Mode – The most frequently occurring value.
  • Range – Difference between the maximum and minimum values.
  • Standard Deviation (SD) – Measures how spread out the data are from the mean.
  • Variance – Square of the standard deviation.
  • Interquartile Range (IQR) – Difference between the 75th and 25th percentiles.
VI. Hypothesis Testing
  • Null Hypothesis (H₀) – The default assumption that there is no effect or difference.
  • Alternative Hypothesis (H₁) – States that we fail to reject the null hypothesis.
  • p-value – Probability of observing the data (or more extreme) assuming H₀ is true.
  • Significance Level (α) – Threshold (commonly 0.05) used to decide whether to reject H₀.
  • Type I Error – Incorrectly rejecting a true null hypothesis (false positive).
  • Type II Error – Failing to reject a false null hypothesis (false negative).
VII. Common Statistical Tests
  • T-test – Compares means between groups.
    • One-sample t-test: compares sample mean to a known population mean.
    • Independent (two-sample) t-test: compares means of two independent groups.
    • Paired t-test: compares means of the same group before and after a treatment.
  • ANOVA (Analysis of Variance) – Tests differences among means of three or more groups.
    • One-way ANOVA: compares one factor across multiple groups.
    • Two-way ANOVA: examines the effect of two factors simultaneously.
    • Post-hoc tests: pairwise comparisons following a significant ANOVA result.
  • Chi-Square Test – Tests relationships between categorical variables.
    • Test of Independence: determines if two categorical variables are related. This is usually the case.
    • Goodness of Fit: checks if observed data match expected distributions. This is when we have a theoretical distribution or percentages.
VIII. Correlation and Regression
  • Correlation – Measures the strength and direction of a linear relationship between two variables (range: -1 to +1).
    • Positive correlation: variables increase together.
    • Negative correlation: one increases while the other decreases.
  • Pearson’s r – The most common correlation coefficient for continuous variables.
  • Regression Analysis – Predicts the value of a dependent variable based on one or more independent variables.
  • Simple Linear Regression – Uses one predictor variable to predict an outcome.
    • Slope: change in outcome for every one-unit increase in predictor.
    • Intercept: predicted outcome when the predictor is zero.
    • R-squared: proportion of variance in the dependent variable explained by the independent variable.
  • Logistic Regression – Predicts binary outcomes (e.g., yes/no, 0/1) using continuous or categorical predictors.
    • Odds Ratio: how much the odds of the outcome change with a one-unit increase in the predictor.
    • Confusion Matrix: table showing correct and incorrect predictions.
    • Accuracy: proportion of correct predictions.
IX. Data Science Utilities
  • Train-Test Split – Dividing data into training (for model building) and testing (for evaluation).
  • Random Seed – Ensures reproducibility by fixing random number generation.
  • Feature – An independent variable or predictor in a dataset.
  • Target Variable – The dependent variable or outcome being predicted.
  • Normalization / Scaling – Adjusting data values to a common scale without distorting differences.

Getting Started

Ready to begin your Python journey? Let’s dive into the fundamentals and start building your programming skills from the ground up. The next section will introduce you to the basic building blocks of Python programming.

Important: Before the workshop begins, please install Python and Anaconda on your laptop. Don’t worry — I’ll guide you step by step during the first session.

Let's install Python in your computer!
  1. To download Python, go to https://www.python.org/downloads/ and click ‘Download Python x.xx.x’. There are separate download options depending on your operating system (OS), example, Windows, macOS, Linux, etc. Double-click the downloaded file and follow the default options for installation.
  2. To download Anaconda, go to https://www.anaconda.com/download/success and click one of the options under “Distribution Installers”. Once download is finished, double-click the downloaded file and follow the default options for installation.
    • Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing. It simplifies package management and deployment, making it the standard choice for data science.

Once installed, open Anaconda Navigator, find Jupyter Notebook, and click Launch — you’re ready to code!