Posters


The creative art of algorithmic embroidery

Sunday 10 a.m.–1 p.m. in Expo Hall A

For thousands of years, people have created beautiful patterns through intricate needlework. Many of these patterns utilize algorithmic concepts like repetition, recursion and variation to build complex motives from simple rules. In this talk, we explore the art of embroidery through Python programming and show how you can create your own patterns with code.

The poster will contain examples that turn straightforward commands into elaborate and intricate artworks with loops, randomness and recursive functions using only the built-in turtle library in Python. We will also show how you can turn your art into embroidery patterns that are readable by an embroidery machine using the TurtleThread library. This poster is for anyone interested in the intersection between Python programming, creative coding and arts and crafts!


Establishing Baseline Open Source Repository Metrics Using Maturity Models

Sunday 10 a.m.–1 p.m. in Expo Hall A

In Fall 2023, volunteers from USDigitalResponse.org entered into a collaboration with the United States Department of Health and Human Services (HHS.gov) and The Centers for Medicare and Medicaid Services (CMS.gov) to support Federal Open Source Projects. What originally began as a project to create an Open Source Runbook specifically for the https://beta.grants.gov API Community, was then ‘spun-out’ into a generalized set of templates and scripts based upon a Repository Maturity Model to help projects standardize their repo content and practices.

Our maturity model is a method for formally assessing the status of an open source project based on Repository Hygiene metrics such as whether certain goals, policies, practices are in place, and levels of community activity.

Most healthy open source repositories have certain elements in common, such as a README, a LICENSE, and other files. Checklists and templates can be used to ensure that a repository meets these general standards, and automated tooling can be used to create and regularly assess repository health on an ongoing basis. USDR and CMS worked together to develop a suite of tools, python scripts, CLIs, GitHub workflows, and documentation resources to enable this type of automation in the repo-scaffolder project (https://github.com/dsacms/repo-scaffolder.)

We are sharing our maturity models and associated tooling in hopes that other open source projects, program offices, and organizations can benefit from understanding how our federal agency values repository health, and in our opinion, what metrics and practices are important to a healthy repository. The healthier our repository ecosystem is, the less risk we face when depending on each other’s work, and the easier it is for us all to work together and contribute in Harmony. https://github.com/dsacms/repo-scaffolder/blob/main/maturity-model-tiers.pdf


Funix, the laziest way to build apps in Python

Sunday 10 a.m.–1 p.m. in Expo Hall A

Funix effortlessly transforms your Python functions or classes into web apps. By extracting type hints from the signature of a function, Funix generates the GUI according to a type-to-widget mapping defined in a theme (such as the default theme). With Funix, developers can turn existing code into apps with no or minimal effort. Like CSS to HTML or macros to LaTeX, Funix separates appearance from content for convenience, manageability, and consistency. All UI-related configurations, whether a theme or per-app/function settings, are declaratively handled via JSON strings outside the core logic.

Funix ingeniously leverages the features and the ecosystem of Python. For example, the Python native types str, bool, Literal, and range are mapped into the following UI widgets respectively: an input box, a checkbox, a set of radio buttons, and a slider. It further maps the popular data science types pandas.DataFrame and matplotlib.figure.Figure into tables and charts. It redirects Python's native print() function to the frontend for conveniently adding contents on the web and uses yield for streaming outputs. As a result, Developers can stick to their coding habits or knowledge without a steep Funix learning curve.

Funix can be used in many use cases. Anyone, especially startup teams, can use Funix to quickly build demos, collect user feedback, and iterate their MVPs. STEM researchers can use Funix to build in-browser playgrounds of their models and/or algorithms, or data viewers of their data. Educators can use Funix to vitalize functions/classes for students to play with the code without calling a function in daunting terminals.

Funix is fully open-source under the MIT license. Its website is http://funix.io


Simulating Cricket Match in Python

Sunday 10 a.m.–1 p.m. in Expo Hall A

Cricket, with over 2 billion fanbases, is the second most popular sport in the world after soccer. Unfortunately, like other sporting events, cricket matches were heavily affected during the Covid-19 pandemic. This simulation project, built in Python, tries to replicate real cricket matches virtually.

In cricket, a team has eleven players. Each ball or delivery in cricket is an event, and the cumulative results of such an event (runs or wickets) decide the winner. This project picks two teams and gathers statistics or career profiles for each player using the Espncricinfo API. The result for each ball or delivery is simulated based on the player's career statistics. The live score is streamed based on the result of each ball, and the winner is decided based on the cumulative runs.


Development of a Novel Deep Learning Based Multimodal Biomedical Image Segmentation and Diagnostic System Using Python

Sunday 10 a.m.–1 p.m. in Expo Hall A

Automatic biomedical image segmentation plays an important role in speeding up disease detection and diagnosis. The rapid development of Deep Learning technology has shown ground-breaking improvements in many fields. However, the task of medical diagnosis is challenging because it is much more sensitive and vulnerable to mistakes. The stakes are high since people's lives can be at risk. An AI-powered first response tool can come in handy in the absence of, or in conjunction with, an experienced health expert.

This poster proposes an integrated pipeline to diagnose various kinds of disease from different modalities of images. Currently, the system supports five types of images including Chest X-ray, Dermoscopy, Microscopy, Ultrasound, and MRI images. First, the Image Input Module takes an image and performs the necessary preprocessing steps. The image is then passed as input to the Segmentation Module. The module is composed of CNL-UNet, a lightweight deep-learning image segmentation model. The model marks the affected region or region of interest in the image. Additionally, the model gives a higher accuracy error correction information for the segmentation output. Combining these two, the system gets more confidence in the decision on the topic of interest. Finally, in a Report Generation Module, considering positions of the region of interest, the area of affected localities, and the confidence score as the inputs, it generates a final medical report using fuzzy inference.

This system is designed using different Python packages e.g., - OpenCV for image processing, Tensorflow/Keras for deep learning, Scikit-learn for machine learning, SkFuzzy for fuzzy inference, and PyQt5 for desktop environment of the system. We believe this poster will exhibit the effective utilization of Python open-source packages in developing a robust medical diagnosis tool and allow us to connect with similarly interested Python practitioners.


Sim2Real Transfer for Robots using Object Detectors and Python Libraries

Sunday 10 a.m.–1 p.m. in Expo Hall A

High-fidelity simulations for drones and other aerial vehicles may appear incredibly lifelike, but it is difficult to learn control procedures in a simulator and then apply them in the actual world. One explanation is that actual photos, particularly on low-power drones, provide output that differs from that produced by simulated images, ignoring for the time being the fact that, at the level relevant to machine learning, simulated worlds themselves seem somewhat different from real ones. We concentrate on employing object detectors that typically translate well from simulation to the real world to get over this constraint, and we extract characteristics from items that are identified in order to feed them into reinforcement learning algorithms.


Python Tool for Tracking the Movement of the International Space Station and Identifying the Weather Condition along the Track

Sunday 10 a.m.–1 p.m. in Expo Hall A

At any given time, it is possible to know the planar position (latitude, longitude) of the International Space Station (ISS). We write a Python application (relying on the packages Pandas, Shapely, GeoPandas and MovingPandas) that collects a time series (over several hours) of the positions of the ISS, identifies the cycles (each time the ISS crosses the central meridian), and plots the tracks. We also determine the weather condition (temperature, surface pressure) of each land point along the track and perform interactive visualizations.


From Dataset to Features: A Python-Based Evolutionary Approach

Sunday 10 a.m.–1 p.m. in Expo Hall A

Multilabel classification is a machine learning task where each instance in a dataset is assigned to multiple labels. This is in contrast to traditional classification, where each instance is assigned to a single label. Multilabel classification has gained popularity in recent years due to its expanding use in a variety of applications across domains.

One of the key challenges in multilabel classification is the high dimensionality of the data, which can make it difficult for machine learning algorithms to learn effectively. This is where feature selection comes in. Feature selection is the process of identifying and selecting a subset of relevant and non-redundant features from a larger set of features. It is a critical preprocessing step that can improve the performance and efficiency of machine learning algorithms.

One of the recent developments in feature selection is the Black Hole algorithm, which is inspired by the phenomenon of black holes in space. The Black Hole algorithm is a metaheuristic that iteratively removes the least relevant features from a dataset, based on a relevance measure such as mutual information. In this talk, we present a modified standalone Black Hole algorithm that incorporates genetic algorithm operators, such as crossover and mutation, to improve its performance in solving multilabel classification problems. The hybridization of Black Hole and Genetic Algorithms has shown to be effective in solving multilabel classification problems in different domains.


Safeguarding Our Forests with Python: Using YOLO V8 For Deforestation Detection

Sunday 10 a.m.–1 p.m. in Expo Hall A

In the realm of artificial intelligence(AI), computer vision has emerged as an indispensable tool across diverse sectors, offering innovative solutions to complex challenges. The forthcoming poster presentation seeks to leverage the potential of computer vision for environmental betterment, focusing specifically on the critical issue of deforestation. Through the utilization of cutting-edge technology, the poster aims to demystify the synergy between the theoretical foundations of computer vision and its practical applications. This inclusive presentation caters not only to seasoned computer vision engineers but also extends its reach to developers less familiar with Python, elucidating the pivotal role that computer vision can play in addressing environmental concerns. The poster serves as a bridge between theory and practice, emphasizing the universal significance of deploying computer vision for the preservation of our planet.

Technical intricacies of computer vision-based deforestation detection will take center stage in this poster, highlighting the incorporation of advanced technologies such as Roboflow and YOLOv8. Through the utilization of YOLOv8, an advanced object identification technology, coupled with Roboflow for the collection and preparation of deforestation data, the method ensures a robust foundation for environmental monitoring. The poster is poised to showcase how these technologies can be seamlessly integrated to address the challenges associated with geographic data, underscoring their mutual benefits. Attendees will witness the smooth fusion of YOLOv8 with Roboflow, from data collection to preprocessing and augmentation, gaining valuable insights on optimizing workflows for deforestation detection. This technical exploration aims to educate both Python developers and computer vision experts by presenting a tangible perspective on the tools and methodologies that contribute to successful environmental preservation using computer vision.

Category Artificial Intelligence Computer Vision Using Python

Audience Level : Some Experience to Advanced Experience


Lace: A Probabilistic Machine Learning tool for Scientific Discovery

Sunday 10 a.m.–1 p.m. in Expo Hall A

Lace is a probabilistic machine learning tool designed to facilitate scientific discovery by learning a model of the data instead of a model of a question.

Scientific breakthroughs are the result of exploring many questions. Often, the question is the breakthrough. But machine learning models require the user to know their question before they begin to model, e.g., “I wish to know Y given X”, which often causes researchers to waste time tuning models to answer questions that might not be valuable or even answerable.

Lace ingests pseudo-tabular data from which it learns a joint distribution over the table, after which users can ask any number of questions and explore the knowledge in their data with no extra modeling. Lace is both generative and discriminative, which allows users to

  • determine which variables are predictive of which others
  • predict quantities or compute likelihoods of any number of features conditioned on any number of other features
  • identify, quantify, and attribute uncertainty from variance in the data, epistemic uncertainty in the model, and missing features
  • generate and manipulate synthetic data
  • identify anomalies, errors, and inconsistencies within the data
  • determine which records/rows are similar to which others on the whole or given a specific context
  • edit, backfill, and append data without retraining

all in one place.


Stegecoaches vs Horses: An illustrated framework to evaluate the "monorepo" vs "multirepo" tradeoffs

Sunday 10 a.m.–1 p.m. in Expo Hall A

Given a diverse, complex codebase spanning multiple projects, application domains, and/or development teams, is there an optimal repository granularity to manage it as it grows larger and more complex over time?

This is a question that, in some form or another, will be familiar to many software maintainers. Though it is often framed in terms of a "monorepo vs multirepo" binary, the choice of how to best version, release, and distribute the code can be seen as a spectrum between fewer, larger, cohesive repos (with "one single monorepo, no matter its size" as one end); and higher number of smaller, independent repos (with "hundreds of teeny-tiny repos, as small as they can be" as the opposite end).

However, as it is often the case when presented with such apparent dilemmas, the solution has less to do with finding "the" best answer, and more with building effective ways of framing the problem and thinking about its tradeoffs in the context of each unique situation.

In our experience as maintainers of a open-source research software ecosystem, we have found the analogy between "stagecoaches" and "horses" effective in presenting the types of decisions that arise when evaluating "monorepos" and "multirepos". The poster presents these scenarios as illustrated cartoons, helping setting the stage for discussion and conversations for maintainers and developers to share their experiences.


The Life and Times of Python Projects

Sunday 10 a.m.–1 p.m. in Expo Hall A

If someone wrote a biography of a Python software package, how would the story be told? What major events and trends might shape the story of its life? How might the life history of one project differ from -- and be surprisingly similar to -- the life history of other projects? In this poster, I depict both the specific life history of several Python projects and the patterns that emerge across projects.


Python in Reusable Rocket Technology

Sunday 10 a.m.–1 p.m. in Expo Hall A

Traditionally, large datasets from wind tunnel experiments have been processed and visualised using MATLAB. However, there is a need of an open-source alternative to make such tools accessible to a wider community, particularly to young scientists and engineers. In this talk, large experimental datasets obtained in a wind tunnel at a flow speed of Mach 7, i.e. seven times the speed of sound, are analysed and visualised using Python. The steps of data processing, analysis, and visualisation are demonstrated and the results compared to identical results obtained from MATLAB. The strengths and weaknesses of Python and its libraries are discussed with regards to data reduction, signal processing, ease of coding, and quality of visualisation options. The wind tunnel experiments involve the use of Pressure-sensitive paint (PSP) in conjunction with a high-speed camera, a technique that allows for the determination of oxygen partial pressure on a surface in the flow. An area of interrogation of 140 mm x 37 mm is coated with PSP and tested with upstream injection of air, nitrogen, and helium. A relative concentration, ranging from 0% to 100%, is constructed in a pixel-by-pixel fashion from the high-speed videos. The video data are treated with a stabilisation algorithm to remove the jitter that stems from the tunnel’s movement during the experiments. The data are then sent through several signal processing loops to reduce the noise. The aim of this poster is to demonstrate that Python and its libraries are capable of producing scientific quality visualisation of aerodynamics or wind tunnel data and persuade researchers, teachers, and students in this field to utilise this open-source resource.


Tools for rapid data profiling to help you debug your data

Sunday 10 a.m.–1 p.m. in Expo Hall A

With more libraries to easily train models or run hypothesis tests in python, it is increasingly easy to jump straight into analysis without actually understand what is in a dataset. In part, this is because much of data exploration today is still very manual. Analysts have to spend time crafting one off visualizations just to get an overview of their data and discover potential issues. With this poster, I will discuss the idea of profiling your data with fast, interactive tools to augment a normal analysis workflow in python. Rather than examining program runtime, the goal of data profiling is to help you understand the essence of your data over the course of an analysis.

I will present the design and implementation of our research to develop the open-source tool AutoProfiler, which is an extension to Jupyter Lab that watches for pandas dataframes in memory and shows data profiles automatically. These data profiles provide a starting point to understand what is in a dataframe to guide next steps in analysis, and importantly, alert users of potential issues in their dataset before these issues make it into downstream models or jeopardize the results of analysis. I will describe the results from several academic studies of our tool that demonstrate how using interactive data profilers can help users discover dataset insights and issues. Finally, I will discuss some of our ongoing research to extend the idea of data profiling beyond tabular data to other types of datasets like text data that are particularly hard to profile.

This poster will be relevant for people who do data work in python will learn about the importance of profiling their data, what sorts of issues to check for in their data, and how tools like AutoProfiler can help make this process easier.


PyZombis: teaching programing, sql, web ui and pygame completely in your browser

Sunday 10 a.m.–1 p.m. in Expo Hall A

PyZombis is an introductory Programing Course using Python, that covers the basic concepts and more advanced topics like Databases, User Interfaces and and Games.

Lectures are complemented with online activities like code visualization and interactive exercises. Chapters have challenges including a hangman and zombie chaser game.

Everything can be run in a browser (even offline), without needing to install Python locally or server-side!

Motivations: * Produce simpler Open Educational Resources that can be easily adapted for teachers/students with diverse needs * Zero-Footprint: avoid server operation costs and maintenance burden (ideal for schools without infrastructure nor good internet connection) * Universal static website: No installation required (learn python in cell phones, tablets, etc.)

This poster will illustrate the different tools and overall design, including: * Brython: python 3.10 in your browser * SQLite and DBAPI wrapper around sqlite.js * Graphical User Interface with web widgets * Pygame for brython port / javascript helpers (game.js) * Architecture: comparing OpenEdx, moodle, Runestone vs static site * Continuos Integration with GitHub Actions and GitHub pages

Credits:

This project is an adaptation of a successful Massive Open Online Course (MOOC) that had +120.000 enrolled students and a great termination rate: "Python para Zumbis" from Fernando Masanori

Thanks to the Python Software Foundation, it has participated in Google Summer of Code 2019, 2021, 2022 and 2023 (under the Python Argentina Sub-Org). Several collaborators have contributed code from other countries: Venezuela, Mexico, Colombia and India.

Repository: https://github.com/PyAr/PyZombis


Breaking Barriers in Research Projects: BeagleTM, a Powerful Python-based Text Mining Tool for Visual Discovery in Scientific Literature

Sunday 10 a.m.–1 p.m. in Expo Hall A

Scientific endeavors cannot be completed in isolation from the literature. Seminal articles serve to maintain and further motivate project development in terms of knowledge, relevance, structure and methodology. Determining key articles to include in a literature review for a project is not a simple task for aspiring researchers in academia. The prevention of knowledge diffusion may also be due to the use of jargon and esoteric writing styles that are meaningful only to seasoned factions of the field, especially when keywords shift over time.

To address these challenges and promote inclusively in various areas of research, we introduce BeagleTM, a robust user-friendly Python-based literature text mining tool. BeagleTM creates visual and interactive "Relationship Networks" to display user-requested information from articles, as networked to other articles sharing overlapping knowledge. Such networks therefore allow for the discovery of articles related to the project's topic, while providing associations to other related articles. To create Relationship Networks users provide BeagleTM with specific keywords to text mine throughout a corpus of millions of peer-reviewed published articles.

BeagleTM also provides visual "Connectivity Networks", to provide a quick overview on the number of interconnected articles from the literature containing coverage of specific or popular keywords across the literature. Connectivity Networks are particularly helpful to novice researchers to assess the level of research that likely exists in the corpus concerning their supplied keywords. Using BeagleTM, aspiring researchers are assisted in discovering and locating project-relevant articles, while ascertaining the associated works that could motivate their research project with effective literature reviews.


Addressing Reverse Kinematics Challenges and Geometric Optimization Complexity in Robotics through Reinforcement Learning: Project GORLA

Sunday 10 a.m.–1 p.m. in Expo Hall A

The complexities of reverse kinematics in robotics programming and geometric optimization present substantial challenges in various industries such as manufacturing, construction, materials engineering, and health sciences. Many projects in these industries need to address both obstacles to be successful. The Geometric Optimization Reinforcement Learning Algorithm (GORLA) introduces a novel approach that will be of interest to automation specialists and machine learning practitioners in the Python community. The algorithm employs reinforcement learning (RL) to tackle both challenges simultaneously.

The primary objective was to train a robot arm in a virtual environment to maximize coverage within a specified area, a task which inherently attempts to solve the reverse kinematics problem and optimize the geometric configuration. The project involved the creation of a custom RL environment in the Python programming language to draw out the specific expected behaviors post-training.

Results show the feasibility of training a model to optimize area coverage. Validation rounds demonstrated the trained agent's superior ability to cover more area in a specified number of steps compared to a sequence of random actions. The proportion of covered area consistently exceeded that of a random agent, showcasing the efficacy of an RL model in optimizing geometric goals and orchestrating the requisite robotic movements.

The poster visually showcases Python's capability to manage complexity and validates the effectiveness of Python in addressing challenges within robotic and machine learning applications, thereby enhancing precision and efficiency across industries. It outlines the project's structure, methodology of using Python for high-performance computing, successful model outcomes, and potential future directions for development.


Business Processes using BPMN and Python

Sunday 10 a.m.–1 p.m. in Expo Hall A

SpiffWorkflow is an open source Python application that can be used to interpret and execute BPMN diagrams (think of them as very powerful flow charts) to accomplish what would otherwise require writing a lot of complex business logic in your code. You can use these diagrams to accomplish a number of tasks, such as: * Creating an online questionnaire which changes depending on the answers to previous questions; * Building a complex approval process that needs to be handed off between multiple people; * Allowing non-developers to make iterative changes to the flow of an application over time;


Engineering Formality and Software Risk in Debian Python Packages

Sunday 10 a.m.–1 p.m. in Expo Hall A

While Free/Libre and Open Source Software (FLOSS) is critical to global computing infrastructure, the maintenance of widely-adopted FLOSS packages is dependent on volunteer developers who select their own tasks. The misalignment of community engineering supply and project demands --- known as underproduction --- has led to code base decay and subsequent cybersecurity incidents such as the Heartbleed and Log4Shell vulnerabilities. Although some prior work suggests that becoming a more formal organization decreases project risk, other work suggests that engineering formalization may actually increase the likelihood of project abandonment. We evaluate the relationship between underproduction and formality, focusing on organizational structure, developer responsibility, and work processes management. Analyzing 182 Python-language GNU/Linux packages made available via the Debian distribution, we find that although more formal community structures are associated with higher underproduction, elevated developer responsibility is associated with less underproduction; the relationship between formal work process management and underproduction is not statistically significant. Our analysis suggests that a FLOSS organization's structural formalization may provoke unintended and diverging consequences which must be carefully managed. We hope that such analyses can assist FLOSS projects in processes of community maturation.


Auditing Gender-Inequality in Healthcare Systems using Natural Language Processing: A Case Study of Sub-Saharan Africa.

Sunday 10 a.m.–1 p.m. in Expo Hall A

This project describes and evaluates a novel active learning approach for incrementally improving the accuracy of a Natural Language Processing (NLP), while optimizing for gender-equitable outcomes in healthcare systems. The approach employs an iterative cyclic model, incorporating data annotation using NLP, human auditing to improve the annotation accuracy especially for data with demographic segmentation, testing on new data (with intentional bias favoring underperforming demographics), and a loopback system for retraining the model and applying it on new data.

We describe experimental integration of the audit tool with distinct NLP tasks in two separate contexts: i.) annotation of medical symptoms collected in Hausa and English languages based on responses to a research questionnaire about health access in Northern Nigeria; ii.) message intent classification in English and Swahili languages based on spontaneous user messages to a health guide chatbot in both Nigeria and Kenya.

Baseline results showed an equity gap in both precision (P) and recall (R): p=.725 and r=.676 for the over-represented class versus p=.669 and r=.651 for the under-represented class. Application of the active learning tool and workflow mitigated this gap after three increments of auditing and retraining (p=.721 and r=.760 for the under-represented class).

Our findings indicate that this gender-aware audit workflow is language agnostic and capable of mitigating demographic inequity while improving overall system accuracy.


Building accessible websites with Wagtail and Django

Sunday 10 a.m.–1 p.m. in Expo Hall A

An astonishing 96.3% of websites have accessibility issues. While often times "accessibility" is viewed as a bolt-on fix for users with disabilities - good accessibility makes the site easier to use for everyone, and can have serious benefits for sales, marketing, and customer experience.

Wagtail is an open source content management system built with Python and Django. In recent years, Wagtail has made huge investments in accessibility. In this poster, we will show some of the background research, major accessibility improvements in Wagtail, and how to leverage these capabilities in Wagtail to improve your own website's accessibility.

Our poster will cover how to get started with the CMS, and a suite of common accessibility considerations, including: - Built-in automated accessibility checks (and how to do this even without using Wagtail) - Alt text – why it matters, how to do it well, opportunities to use generative AI - Tools from the Python ecosystem to help with accessibility


Hacker Forum Topics and Networks: A Machine Learning & Qualitative Text Analysis Approach for Cybersecurity

Sunday 10 a.m.–1 p.m. in Expo Hall A

As technology evolves, cybercriminals pose several security threats within cyberspace through various modes of cyberattacks. Hackers openly share knowledge and hacking tools within dark and surface web forums and marketplace. Additionally, cybercriminals buy, sell, and trade hacked or stolen data in these forums. In this research, we apply machine learning techniques and qualitative text analysis to automatically classify and cluster hacker forums, users, and topics using topic models and networks. We begin with automatically classifying topics and key users, then select sample topics and representative texts for in-depth analysis using qualitative methods. Results show that hacker forums discuss hacking tools and tutorials, antivirus, passwords, source codes, learning programming languages, and selling stolen data. Our project demonstrates the significance of applying real-time Python text and network analysis techniques to analyze, classify, detect, and monitor threat activities, actors and networks, emerging operations, tactics, and trends from hacker forums for cybersecurity and cyber threat intelligence.


Exploring Single-Cell RNAseq Data Using Python

Sunday 10 a.m.–1 p.m. in Expo Hall A

Today it is possible to obtain genome-wide transcriptome data from single cells using high-throughput sequencing. Single-cell RNA sequencing (scRNA-seq) has significantly advanced our knowledge of biological systems. The great potential of this technology has motivated computational biologists to develop a range of analysis tools. However, up until recently most analysis tools were developed in programming language R.

Recently, a flourishing body of python based computational tools have made it easier to robustly analyze single-cell -omics datasets in a scalable and reproducible way. Here we will dive into conducting an analysis of a single-cell RNA-sequencing dataset with Scanpy and scvi-tools, two popular Python libraries for general purpose analysis tasks.

Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing. The Python-based implementation efficiently deals with datasets of more than one million cells. scvi-tools (single-cell variational inference tools) is a package for probabilistic modeling of single-cell omics data, built on top of PyTorch and AnnData. The package hosts implementations of several models that perform a wide range of single-cell data analysis tasks, as well as the building blocks to rapidly prototype new probabilistic models.

The goal of this poster is to help PyCon attendees from all backgrounds feel empowered to explore scRNA-seq data. Specifically, we hope attendees leave with the ability to: * Understand a general workflow for dealing with scRNA-seq data * Anticipate and avoid some of the most common pitfalls in scRNA-seq analysis * Build intuition around the tradeoffs inherent in analytical choices * Feel comfortable and confident working with current Python-based tools for single cell analysis * Know where to find additional information and assistance


Arlington Tech's NASA ADC

Sunday 10 a.m.–1 p.m. in Expo Hall A

A team of high school students from Arlington Tech are taking part in an App Development Challenge from NASA. Our diverse group of mostly women developers are working together to visualize an area of the moon and create a path for astronauts to search for potential resources.

The majority of the students in our group are taking Python programming classes. As soon as we found out about the challenge and what it entailed, we knew that the best way to solve the problem was with Python. We explored multiple modules and libraries such as Plotly, PyVista and Ursina. In the end we decided on the Ursina game engine. Using Ursina, our team has created both a technological and creative way to utilize lunar terrain data and track an efficient path using certain points on the moon. The app we will be presenting will show the landing region we chose, with a path of communication checkpoints that we mathematically formulated inside of the landing region.


All the ways to make the matplotlib plots look good by default

Sunday 10 a.m.–1 p.m. in Expo Hall A

Intro

All the default plot settings in matplotlib are stored in so called rcParams file. There's a multiple ways those settings can be modified to create more modern look of the plots. Creating a unique but uniform style for all of your reports/branding is important and knowing how to do it once and then simply get on with plotting is a powerful skill.

Aim

To demonstrate and teach how rcParams can be changed on the fly or on a more permanent basis to achieve good looking plots.

Audience

Professionals of all levels looking to improve their data visualisation skill.

Content

Way 1

Create a dictionary of rcParam settings at the top of the Jupyter Notebook. This will modify the settings for all the plots in that notebook.

Way 2

Modify rcParams using context manager (with). This allows to temporarily change the appearance of the plot for as long as we're staying the context manager. This way is good for a one-off deviations or modifications of your established style.

Way 3

Modify rcParams using @mpl.rc_context() decorator.

Way 4

Defining your own style sheet and then loading it when needed.

The poster will also demonstrate a way to restore default settings when needed.


Python for Conservation: Assessing Night Sky Quality in National Parks

Sunday 10 a.m.–1 p.m. in Expo Hall A

The US National Park Service (NPS) has been established with a mission to protect park resources and values. Many visitors come to national parks to enjoy nature, including being immersed in the starry night sky. Natural dark sky is a critical element for natural, cultural, educational, and visitor experience reasons. However, light pollution is diminishing the quality of the night sky. Light sources from distances up to 200 miles have the potential to brighten the night. To assist national parks in the management of natural resources, the NPS Natural Sounds and Night Skies Division provides scientific, technical, and administrative support. Measuring and monitoring light pollution is the first step in preserving natural night skies in parks. We use specialized cameras to capture images of night skies and build a Python pipeline to analyze this data. The pipeline currently reads in raw images and outputs calibrated fisheye view of the night sky. Future pipeline features will include modeling the natural sky brightness to separate out light from artificial sources and deriving sky quality indicators. We are exploring an object-oriented approach for modeling the natural sky and machine-learning capability for image sorting. This pipeline will help us answer questions like how much light is coming from natural sources and how much is from artificial sources. The NPS is using Python to assess resource conditions and fulfill its mission in conserving resources unimpaired for the enjoyment of future generations. We also hope our open-source pipeline will benefit other scientists worldwide in preserving natural dark skies.


Fine-Tuning Large Language Models with Declarative ML Orchestration

Sunday 10 a.m.–1 p.m. in Expo Hall A

LLMs used in tools like ChatGPT are everywhere; however, only a few organizations with massive computing resources can train such large models. While eager to fine-tune these models for specific applications, the broader ML community often grapples with significant infrastructure challenges. Also, another significant challenge is to keep these LLMs up to date thus requiring techniques like RAG and external data storage.

In the poster, the audience will understand how open-source ML tooling like Flyte can be used to provide a declarative specification for the infrastructure required for a wide array of ML workloads, including the fine-tuning of LLMs, even with limited resources. Thus the attendee will learn how to leverage open-source ML toolings like Flyte's capabilities to streamline their ML workflows, overcome infrastructure constraints, reduce cost, and unlock the full potential of LLMs in their specific use case. Thus making it easier for a larger audience to leverage and train LLMs.

Main Points Covered in the Poster: - The infrastructure requirements and challenges for fine-tuning LLM models - How to use Retrieval Augmented Generation to keep LLMs up to date. - Using modern techniques for fine-tuning LLMs like 8-bit quantization and LoRA - How open-source ML Orchestrator frameworks like Flyte's declarative specification and abstractions can automate and simplify infrastructure setup - Leveraging open-source tooling to specify ML workflows for fine-tuning large language models - How Flyte can reduce infrastructure costs and optimize resource usage

Through this poster, attendees will understand how open-source ML orchestration tooling can unlock the full potential of large language models by making their fine-tuning easier and more accessible, even with limited resources. This will enable a larger community of researchers and practitioners to leverage and train large language models for their specific use cases.