SciPy 2021 | Tutorial Instructions

scipy-2021-web-banner-white-virtualconf-

Tutorial Information and Participant Instructions

Tutorial set up instructions will be added in June. There will be a slack channel set for each tutorial if you have trouble with set up or have additional questions.

Beginner Tutorials

Monday, July 12, 2021 at 2:00:00 PM UTC

Introduction to Python and Programming

July 12, 2021 at 2:00:00 PM

Matt Davis

This tutorial is a gentle introduction to Python for folks who are completely new to it and may not have much experience programming. We’ll work in a Jupyter Notebook, one of the most popular tools in scientific Python. You’ll learn how to write beautiful Python while practicing loops, if’s, functions, and usage of Python’s built-in features in a series of fun, interactive exercises. By the end of the tutorial we think you’ll be ready to write your own basic Python -- but most importantly, we want you to learn the form and vocabulary of Python so that you can understand Python documentation, interpret code written by others, and get the most out of other SciPy tutorials.

Tutorial Prerequisites: None

Setup instructions: https://github.com/jiffyclub/scipy-2021-intro-to-python#introduction-to-python-at-scipy-2020

Monday, July 12, 2021 at 2:00:00 PM UTC

Bayesian Data Science by Simulation

July 12, 2021 at 2:00:00 PM

Hugo Bowne-Anderson and Eric Ma

This tutorial is an introduction to Bayesian data science through the lens of simulation or hacker statistics. We will become familiar with many common probability distributions through i) matching them to real-world stories & ii) simulating them. We will work with joint/conditional probabilities, Bayes Theorem, prior/posterior distributions and likelihoods while seeing their applications in real-world data analyses. We’ll see the utility of Bayesian inference in parameter estimation and comparing groups and we’ll wrap up with a dive into the wonderful world of probabilistic programming.

Tutorial Prerequisites: Knowledge of `numpy`, `matplotlib`, and Python are prerequisites for this tutorial, in addition to curiosity and an excitement to learn new things!

Setup instructions: https://github.com/ericmjl/bayesian-stats-modelling-tutorial

Monday, July 12, 2021 at 2:00:00 PM UTC

The Jupyter Interactive Widget Ecosystem

July 12, 2021 at 2:00:00 PM

Matthew Craig, Itay Dafna, Martin Renou, Mariana Meireles and Youness Bennani

Jupyter widgets are powerful tools for building user interfaces with graphical controls such as sliders and text boxes inside a Jupyter notebook. Interactive widgets can also be rendered in Sphinx documentation, nbviewer, and static or interactive web pages. The recent release of version 8.0 of the core widgets package adds several new features. Jupyter widgets are also a framework that makes it easy to build custom GUI controls. Examples of custom widget packages include libraries for interactive 2-D charting, 3-D graphics, mapping, complex controls, and more.

Tutorial Prerequisites: Previous experience with Python (any level) is required, and some experience with the basics of Jupyter Lab or Notebooks would be helpful.

Setup instructions: https://github.com/jupyter-widgets/tutorial#installation

Monday, July 12, 2021 at 7:00:00 PM UTC

Network Analysis Made Simple

July 12, 2021 at 7:00:00 PM

Mridul Seth

Through the use of NetworkX's API, tutorial participants will learn about the basics of graph theory and its use in applied network science. Starting with a computationally-oriented definition of a graph and its associated methods, we will build out into progressively more advanced concepts (path and structure finding, and graph theory's relation to linear algebra), as well as an overview of scalable alternatives to NetworkX.

Tutorial Prerequisites: Python programming; use of built-in data structures and flow control (conditionals, loops, dictionaries, lists).

Prior experience with NumPy is helpful, but not required.

Setup instructions: https://ericmjl.github.io/Network-Analysis-Made-Simple/00-preface/01-setup/

Monday, July 12, 2021 at 7:00:00 PM UTC

Introduction to Numerical Computing With NumPy

July 12, 2021 at 7:00:00 PM

Logan Thomas and Alexandre Chabot-Leclerc

NumPy provides Python with a powerful array processing library and an elegant syntax that is well suited to expressing computational algorithms clearly and efficiently. We'll introduce basic array syntax and array indexing, review some of the available mathematical functions in NumPy, and discuss how to write your own routines. Along the way, we'll learn just enough about matplotlib to display results from our examples.

Tutorial Prerequisites: The tutorial is intended for people new to the scientific Python ecosystem. Previous experience in Python or another programming language is useful but not required.

Requirements: Python 3.6+ with NumPy and Matplotlib are required. Students may follow the installation instructions for Anaconda.

Setup instructions: https://github.com/enthought/Numpy-Tutorial-SciPyConf-2021

Tuesday, July 13, 2021 at 2:00:00 PM UTC

Learn Python Through Data Processing in Pandas

July 13, 2021 at 2:00:00 PM

Daniel Chen

Python has become the lingua fraca in data science and machine learning. However, the "DataFrame" and "array" data structure used in data science means a different teaching and learning approach is needed from more traditional introductory programming classes. A DataFrame centric teaching approach means more complex topics need to be taught earlier when compared to traditional programming classes, but this complexity is offset by its pragmatism. This is a tutorial for beginners on using the Pandas library in Python for data manipulation. We will focus on tidy data principles and how it fits into the data manipulation framework.

Tutorial Prerequisites: No prior knowledge will be needed to attend the workshop.The only prerequisite packages are already installed in the anaconda distribution. Only pandas, seaborn, and scikit-learn will be needed for this workshop.

Setup instructions: https://github.com/chendaniely/2021-07-13-scipy-pandas

Monday, July 12, 2021 at 7:00:00 PM UTC

HoloViz: Visualize all your data easily, from notebooks to dashboards

July 12, 2021 at 7:00:00 PM

James A. Bednar, Philipp Rudiger and Jean-Luc Stevens

This tutorial will show you how to turn nearly any notebook into a deployable dashboard, how to build visualizations easily even for big, streaming, and multidimensional data, and how to build linked, interactive drill-down exploratory tools without having to run a web-technology software development project.

Tutorial Prerequisites: Experience plotting data with any tool (Python or otherwise), along with either NumPy or Pandas Basics, should be sufficient background for understanding and appreciating the approach used here.

Setup instructions: https://github.com/holoviz/holoviz/blob/master/doc/installation.rst

Tuesday, July 13, 2021 at 2:00:00 PM UTC

Magical NumPy with JAX

July 13, 2021 at 2:00:00 PM

Eric Ma

JAX's magic brings your NumPy game to the next level! Come learn how to write loop-less numerical loops, optimize _any_ function, jit-compile your programs, gain reliability over stochastic numbers - basically equip yourself with a bag of tricks to help you write robust numerical programs.

Tutorial Prerequisites: If you're comfortable with the NumPy API, then you'll be well-equipped for this tutorial. This is a tutorial that will equip you beyond simple deep learning; instead of learning how to use a deep learning framework, you'll leave equipped with a toolkit to write high-performance numerical models of the world and optimize them with gradient descent. Familiarity with Jupyter will help. Local setup is not necessary; Binder will be an option for tutorial participants.

Setup instructions: https://ericmjl.github.io/dl-workshop/

Tuesday, July 13, 2021 at 2:00:00 PM UTC

Fairness in AI Systems: From Social Context to Practice using Fairlearn

July 13, 2021 at 2:00:00 PM

Triveni Gandhi, Manojit Nandi, Miro Dudík, Hanna Wallach, Michael Madaio, Hilde Weerts, Adrin Jalali and Lisa Ibañez

Fairness in AI systems is an interdisciplinary field of research and practice that aims to understand and address some of the negative impacts of AI systems on society. In this tutorial, we will walk through the process of assessing and mitigating fairness-related harms in the context of the U.S. health care system. This tutorial will consist of a mix of instructional content and hands-on demonstrations using Jupyter notebooks. Participants will use the Fairlearn library to assess an ML model for performance disparities across different racial groups and mitigate those disparities using a variety of algorithmic techniques.

Tutorial Prerequisites: Participants are expected to have intermediate Python skills and familiarity with Scikit-Learn. For maximal benefit, participants should have some experience training and evaluating supervised models in Python.

Setup instructions: https://github.com/fairlearn/talks/tree/main/2021_scipy_tutorial

Tuesday, July 13, 2021 at 7:00:00 PM UTC

Bioimage analysis fundamentals in Python

July 13, 2021 at 7:00:00 PM

Nicholas Sofroniew

In this tutorial, we will explore some of the most critical Python libraries for scientific computing on images, by walking through fundamental bioimage analysis applications of linear filtering (aka convolutions), segmentation, and object measurement, leveraging the napari viewer for interactive visualisation and processing. We will also demonstrate how to extend these concepts to bigger-than-RAM images using Dask.

Tutorial Prerequisites: Attendees should have some existing Python experience (comfortable with features like writing functions and classes, and executing code in Jupyter notebooks), as well as experience with the scientific Python ecosystem (e.g. NumPy and SciPy). Some basic image processing experience (e.g. scikit-image) might be useful but not required.

Setup instructions: https://github.com/sofroniewn/tutorial-scipy2021-bioimage-analysis-fundamentals

Tuesday, July 13, 2021 at 7:00:00 PM UTC

(D)Ask me Anything About Data Analytics at Scale

July 13, 2021 at 7:00:00 PM

Ramon Perez

The size of datasets in diverse industries keeps increasing by the minute and, luckily, for data professionals using Python as their day-to-day tool, there is Dask, a library for large-scale data analytics and data preprocessing. This tutorial takes advantage of dask and other libraries to teach you diverse techniques for data exploration, hypothesis testing, and dashboard creation using large datasets. In addition, this tutorial uses a top-down approach, meaning, in each of the major sections you will start with the results first and then work your way backwards through the large-scale data analytics process.

Tutorial Prerequisites: The target audience for this session includes analysts of all levels, developers, data scientists, and engineers wanting to learn how to analyze large amounts of data that don’t fit into the memory of their machines.

This tutorial is at the intermediate level and requires that participants have at least 1 year of experience coding in Python. The following are some of the Prerequisites (P) and Good To Have's (GTH)

- (P) Attendees for this tutorial are expected to be familiar with Python (1 year of coding).
- (P) Participants should be comfortable with loops, functions, lists comprehensions, and if-else statements.
- (GTH) While it is not necessary to have knowledge of dask, pandas, NumPy, datashader, and Holoviews, a bit of experience with these libraries would be very beneficial throughout this tutorial.
- (P) Participants should have at least 6 GB of free memory in their computers.
- (GTH) While it is not required to have experience with an integrated development environment like Jupyter Lab, this would be very beneficial for the session as it is the tool we will be using all throughout

Setup instructions: https://github.com/ramonpzg/scipyus21_dask_analytics

Tuesday, July 13, 2021 at 7:00:00 PM UTC

Hands-on Introduction to Property-Based Testing for Science

July 13, 2021 at 7:00:00 PM

Zac Hatfield-Dodds and Ryan Soklaski

Code is now a critical part of almost all research, whether for communication or for data collection and analysis. Unfortunately, producing reliably error-free code is an open problem, and result-altering bugs are regularly found (and usually fixed) in everything from preprints to foundational open source packages.

I believe there is a core, fixable problem: writing tests is tedious, difficult, and only covers edge cases we know to test for. The solution? Use tools that write tests for us!

Crucially, this isn't a pipe-dream: it's a proven technique that the scipy ecosystem has already started to use, and just needs to scale up. Hypothesis - https://hypothesis.readthedocs.io/ - is basically a superhuman experimentalist. You write a test function and describe what inputs it should pass for; and the Hypothesis engine searches for a falsifying example.

This process often tries inputs that I wouldn't think of, such as Numpy arrays with a sizezero dimension or a "signalling" NaN represented by a non-standard bit-pattern; and as a result it regularly uncovers bugs that users *and authors* didn't know were possible.

The tutorial is designed for researchers and software engineers who regularly write code that other scientists rely on. You might be 'the Python person' in your lab; a core developer of one of the core SciPy or PyData libraries, or an enthusiast looking for a valuable way to contribute to that ecosystem.

Attendees are expected to be familiar with Numpy and Pandas, as well as with traditional unit testing (i.e. pytest or unittest), and ideally with writing traditional tests for numerical or data-centric code. You don't need to be an expert in any of these, but the tutorial will have plenty of content to engage those who are!

If this sounds too good to be true, Hypothesis has quickly found bugs in approximately everything it has even been pointed at, including Numpy, Astropy, Xarray, CPython, and of course Hypothesis itself. If you already have a large stack of bug reports, maybe ask that it only be used for new features!

Tutorial Prerequisites: This tutorial is designed for researchers and software engineers who regularly write code that other scientists rely on. You might be 'the Python person' in your lab; a core developer of one of the core SciPy or PyData libraries, or an enthusiast looking for a valuable way to contribute to that ecosystem.

Attendees are expected to be familiar with Numpy and Pandas, as well as with traditional unit testing (i.e. pytest or unittest), and ideally with writing traditional tests for numerical or data-centric code. You should also be familiar with Python's syntax for decorators (such as `@functools.cache`) and lambda-functions (e.g. `lambda x: x > 0`).

You don't need to be an expert in any of these, but the tutorial will still have plenty of content to engage those who are!

Setup instructions: https://github.com/rsokl/testing-tutorial/

Intermediate Tutorials

Tutorial Information and Participant Instructions

Institutional Sponsor