Kyle Otstot 👨‍🔬

Incoming Machine Learning Engineer @ TikTok · Mountain View, CA · kotstot@asu.edu

Hello world 👋 welcome to my website! I am 23 years old and recently graduated from Arizona State University with a MS degree in computer science. Through my research, coursework, and personal projects, I have gained experience in a range of machine learning (ML) fields, including computer vision, natural language processing, recommender systems, and time-series forecasting. Recently, I worked for Spotify as a Machine Learning Engineer Intern, where I developed a new dataset filtering technique that improved the online performance of a model responsible for playlist track recommendation. I also worked as a Graduate Research Assistant for Sankar Lab of ASU, where we curated robust methods for the common ML security problems of domain adaptation and generative adversarial network (GAN) training instability. Next month, I will join TikTok as a Machine Learning Engineer in Mountain View, California! Hope you enjoy my website, and don't forget to connect with me on social media! (links below) [Last updated: 12/23]


Education 📚

Master of Science, Computer Science 🖥

Arizona State University

Thesis: Towards Addressing GAN Training Instabilities: Dual-Objective GANs with Tunable Parameters

Key Courses: Data Mining, Statistical Machine Learning, Semantic Web Mining, Data Visualization, ML Security & Fairness, Cloud Computing

GPA: 4.0

August 2022 - July 2023

Bachelor of Science, Computer Science 💻 🎓

Barrett, the Honors College - ASU

Thesis: A Graph-Based Machine Learning Approach to Realistic Traffic Volume Generation

Awards: Recipient of 2021-22 William E. Lewis Excellence in Computer Science Engineering Scholarship and ASU President's Scholarship, Summa Cum Laude, Dean's List (2019-2022)

Key Courses: Object-Oriented Programming, Digital Design Fundamentals, Intro to Programming Languages, Assembly Language, Data Structures & Algorithms, Software Development, Information Assurance, Principles of Programming Languages, Operating Systems, Theoretical Computer Science, Computer Networks, Social Media Mining

GPA: 4.0

August 2019 - May 2022

Bachelor of Science, Mathematics ✖ ➗

Barrett, the Honors College - ASU

Awards: Phi Beta Kappa Inductee , Wexler Mathematical Sciences Senior Dinner Invitee , Summa Cum Laude, Dean's List (2019-2022)

Key Courses: Discrete Math Structures, Applied Linear Algebra, Mathematical Structures, Advanced Calculus, Advanced Linear Algebra, Graph Theory, Linear Optimization, Scientific Computing, Computational Methods for Image Processing

GPA: 4.0

August 2019 - May 2022

International Baccalaureate (IB) Diploma 📜

Desert Mountain High School

Awards: Salutatorian (rank 2/536), AP national scholar, IB diploma recipient

GPA: 4.92    SAT: 1580

August 2015 - May 2019

Experience 👨‍💻

Machine Learning Engineer Intern 🎶

Spotify - Personalization Mission

For this internship, I worked on improving a deep learning classifier that contributes to the "personalization" of playlists, i.e. scoring of tracks tailored to the user.

  • I proposed a novel dataset filtering method to improve the aforementioned classification model, which increased percentage of users with long listens by 0.7%.
  • In doing so, I designed an experiment for Bayesian-optimizing the filtering method, and created a Ray pipeline that reduced the model training runtime from hours to seconds.
  • Additionally, I contributed to a Kubeflow training pipeline deployed in production for an A/B test, which proved to be a successful test (see first bullet).
June 2023 - September 2023

Graduate Research Assistant 👨‍🔬

Sankar Lab - Arizona State University

Having completed my Bachelor's degrees in Spring 2022, I continued to work under Dr. Lalitha Sankar in her lab with a new project on GAN training stability.

  • The summer before graduate school, I began a new research project with other members in Sankar Lab, which was funded for me by the 2022 SURI program. In this project, we focus on the use of alternative objective functions (i.e., not binary cross entropy) for generative adversarial network (GAN) training in order to help stabilize convergence and lessen the network's performance dependency on random weight initializations. In doing so, we consider alpha-loss, a family of tunable loss functions that encapsulate cross entropy loss and other notable losses. Over the summer, we were able to empirically (and a bit theoretically) show that some instances of alpha-loss help reduce the common GAN training threats of vanishing gradients, exploding gradients, and mode collapse. Experiments were primarily done on 2D toy datasets with fully-connected generator and discriminator models.
  • At the beginning of the 2022-2023 school year, I was officially appointed as an ASU Graduate Research Assistant (GRA) in order to continue my research on GAN training instabilities. Having first observed performance gains on the 2D toy datasets, we then moved to the benchmark image datasets of Stacked MNIST, Celeb-A, and LSUN. In both cases, we consider the use of deep convolutional generative adversarial nets (DCGANs) with state-of-the-art optimization (Adam) and objective functions (least squares & alpha-loss). As a result, I co-authored the paper titled \((\alpha_{D}, \alpha_{G})\)-GANs: Addressing GAN Training Instabilities via Dual Objectives , which was accepted into the 2023 IEEE International Symposium on Information Theory (ISIT) conference and New Frontiers in Adversarial Machine Learning workshop at the 2023 International Conference on Machine Learning (ICML).
June 2022 - May 2023

Undergraduate Research Intern 🧪

Sankar Lab - Arizona State University

At the end of the Spring 2021 semester, I began my undergraduate research under the supervision of Dr. Lalitha Sankar and her ASU lab. Our work primarily focused on the robustness of (classification) loss functions under settings of train label noise.

  • In the summer of 2021, I was accepted into ASU's Summer Undergraduate Research Initiative (SURI) program, which funded my work with Sankar Lab. Over the 8 week program, I learned how to design image classification experiments with PyTorch and ASU's shared cluster environment (Agave); specifically, I implemented experiments with benchmark clean/corrupted datasets (CIFAR-10/100, CIFAR-10/100-C), deep model architectures (WideResNet), state-of-the-art optimization techniques (cosine learning rate annealing), and robust loss functions (alpha loss, NCE+RCE, focal loss). At the end of the program, I compiled and presented my work titled Robustness of a Tunable Loss Function Family on Corrupted Datasets , which can be found on this website.
  • For the 2021-2022 school year, I was awarded a Research Experience for Undergraduates (REU) position funded by the National Science Foundation (NSF) in order to continue my research with Sankar Lab. During this time, we built on our work from the summer by developing a method (named AugLoss) that combines the use of data augmentation and robust loss functions in order to simultaneously mitigate the threats of test-time distribution shifts and train-time noisy labeling of the data. We show that our method can outperform previous state-of-the-art methods on (feature + label) corrupted versions of the benchmark CIFAR-10/100 datasets. As a result, I first-authored the paper titled AugLoss: A Learning Methodology for Real-World Dataset Corruption , which was accepted into the Principles of Distribution Shift workshop at the International Conference on Machine Learning (ICML). In July 2022, I presented our work at this ICML workshop in Baltimore, Maryland.
April 2021 - June 2022

Undergraduate Teaching Assistant 👨‍🏫

Fulton Schools of Engineering - ASU

For the Fall 2020, Spring 2021, and Fall 2021 semesters, I worked as a TA for the Probability & Statistics for Engineers (IEE 380) course under Dr. Michael Clough in ASU's Fulton Schools of Engineering. This course covers the fundamentals of probability theory, descriptive statistics, one/two sample hypothesis testing, simple/multiple linear regression, and statistical quality control. As a TA, my primary responsibilities included creating/presenting exam reviews (see this midterm review and final exam review for examples), holding semi-weekly office hours, monitoring discussion boards, and proctoring exams. Additionally, I developed a website called Hypothetest that aims to help students solve and visualize their two-sample hypothesis test problems.

August 2020 - December 2021

Instructional Aide 🎒

School of Mathematical & Statistical Sciences - ASU
  • In the Summer 2021 semester, I worked for one Calculus for Engineers II (MAT 266) course in ASU's School of Mathematical & Statistical Sciences. My primary responsibilities included holding office hours and answering questions on the discussion board.
  • Also in the Summer 2021 semester, I worked for one Elementary Linear Algebra (MAT 242) course of the same school. My primary responsibilities included monitoring and answering questions on the discussion board.
  • Lastly, between the Spring 2021 and Summer 2021 semesters, I worked for two Math for Business Analysis (MAT 211) courses of the same school. My primary responsibilities included holding office hours and answering questions on the discussion board.
March 2021 - December 2021

Student Grader 📝

Arizona State University
  • Between the semesters of Summer 2021 and Fall 2021, I graded for 7 Discrete Math Structures (MAT 243) courses in ASU's School of Mathematical & Statistical Sciences. My primary responsibilities included holding office hours, as well as grading homework assignments and quizzes.
  • Additionally, I graded for one Fall 2021 semester of the Intro to Theoretical Computer Science (CSE 355) course in ASU's School of Computing & Augmented Intelligence. My primary responsibilities included holding office hours, as well as grading recitations and homework assignments.
June 2021 - December 2021

Projects 📊

Towards Addressing GAN Training Instabilities: Dual-Objective GANs with Tunable Parameters 🎛 🤏

Robust Generative Models, Convolutional Nets, Fréchet Inception Distance

This thesis introduces the \((\alpha_{D}, \alpha_{G})\)-GAN, a parameterized class of dual-objective GANs, as an alternative approach to the standard vanilla GAN. The \((\alpha_{D}, \alpha_{G})\)-GAN formulation, inspired by \(\alpha\)-loss, allows practitioners to tune the parameters \((\alpha_{D}, \alpha_{G}) \in [0,\infty)^{2}\) to provide a more stable training process. The objectives for the generator and discriminator in \((\alpha_{D}, \alpha_{G})\)-GAN are derived, and the advantages of using these objectives are investigated. In particular, the optimization trajectory of the generator is found to be influenced by the choice of \(\alpha_{D}\) and \(\alpha_{G}\). Empirical evidence is presented through experiments conducted on various datasets, including the 2D Gaussian Mixture Ring, Celeb-A image dataset, and LSUN Classroom image dataset. Performance metrics such as mode coverage and Fréchet Inception Distance (FID) are used to evaluate the effectiveness of the \((\alpha_{D}, \alpha_{G})\)-GAN compared to the vanilla GAN and state-of-the-art Least Squares GAN (LSGAN). The experimental results demonstrate that tuning \(\alpha_{D} < 1\) leads to improved stability, robustness to hyperparameter choice, and competitive performance compared to LSGAN.

Paper   ·   Presentation   ·   Github

February 2023 - June 2023

\((\alpha_{D}, \alpha_{G})\)-GANs: ✌ ✅ Addressing GAN Training Instabilities via Dual Objectives

Robust Generative Models, Convolutional Nets, Fréchet Inception Distance

In an effort to address the training instabilities of GANs, we introduce a class of dual-objective GANs with different value functions (objectives) for the generator (\(G\)) and discriminator (\(D\)). In particular, we model each objective using \(\alpha\)-loss, a tunable classification loss, to obtain \((\alpha_{D}, \alpha_{G})\)-GANs, parameterized by \((\alpha_{D}, \alpha_{G}) \in [0, \infty)^{2}\). For sufficiently large number of samples and capacities for \(G\) and \(D\), we show that the resulting non-zero sum game simplifies to minimizing an \(f\)-divergence under appropriate conditions on \((\alpha_{D}, \alpha_{G})\). In the finite sample and capacity setting, we define estimation error to quantify the gap in the generator's performance relative to the optimal setting with infinite samples and obtain upper bounds on this error, showing it to be order optimal under certain conditions. Finally, we highlight the value of tuning \((\alpha_{D}, \alpha_{G})\) in alleviating training instabilities for the synthetic 2D Gaussian mixture ring and the Stacked MNIST datasets.

Paper   ·   Poster   ·   Github

November 2022 - January 2023

DiscoNet: 🕺 🕸 Towards Mitigating Shortcut Learning with Cross-Domain Regularization

Domain Adaptation, Shortcut Learning, Generative Adversarial nets

Deep learning methods have achieved remarkable advancements in the task of image classification, but progress has been particularly limited in settings of out-of-distribution testing. As a result, domain adaptation methods have been proposed to robustify the model against unforeseen data distribution shifts; however, we find that these methods are inherently vulnerable to shortcut learning, a phenomenon where models learn on spurious cues instead of the true image semantics. In this project, we propose a new domain adaptation method, DiscoNet, that learns a cross-domain mapping between the source and target domains in order to embed each dataset similarly during training. We find that our approach is robust to shortcut learning, which is demonstrated with our novel dataset called Striped MNIST. Overall, we hope to underscore the importance of finding a relationship between source and target datasets when curating new domain adaptation solutions.

Paper   ·   Github

October 2022 - December 2022

Accident-Analyzer: 🚗 🗺 Understanding Vehicle Accident Patterns in the United States

Data Visualization, D3 Javascript library

In this project, we re-imagine CrimAnalyzer- a visualization assisted analytic tool for crimes in São Paulo- in the context of traffic accidents, ultimately producing Accident-Analyzer. In doing so, we explore the spatio-temporal patterns of traffic accidents across the United States from 2016 to 2021. The Accident-Analyzer system allows for users to identify local hotspots, visualize accident trends over time, and filter the data by key weather categories in real-time. The visualization was primarily created with the D3 and Leaflet JS libraries, the dataset preprocessing was done with Python, and the data is stored/accessed via MySQL database.

Paper   ·   Poster   ·   Github

September 2022 - December 2022

A Survey of Deep Learning-Based Movie Recommendation Systems 🎬

Recommendation Systems, Deep Learning

In this work, we provide a comprehensive and systematic analysis of current research methods on deep learning-based movie recommendation systems, specifically with empirical evaluation on the benchmark MovieLens dataset. We also provide a detailed taxonomy and summaries of state-of-the-art algorithms, providing a perspective on the future trends and research challenges of deep learning recommendation systems.

Paper   ·   Presentation   ·   Github

September 2022 - December 2022

Non-Targeted White-Box Evasion Attacks on the Fashion MNIST dataset 🔐👗

Adversarial learning, evasion attacks, deep learning

In this project, I train a convolutional neural net (CNN) with a LeNet-5 architecture on the Fashion MNIST dataset, which achieves a 99.3% validation accuracy. Then, I implement two white-box evasion attacks– namely, fast gradient sign method (FGSM) and projected gradient descent (PGD)– in order to create adversarial examples that closely resemble the original Fashion MNIST image, yet dramatically alter the classification output of the CNN. This project underscores the vulnerability of the standard deep learning classification algorithm to carefully-crafted adversarial images.

Report   ·   Github

September 2022 - September 2022

Building Robust and Accurate Transaction Classifiers with Deep Transfer Learning 💰✅

NLP, Gradient Boosting, Deep Transformers, Web Scraping

In this work, I develop a solution for the task of transaction categorization, specifically with the dataset provided by the 2022 Wells Fargo Campus Analytics Challenge. Overall, I achieve a few noteworthy contributions, including the engineering of two attributes responsible for improvement in ML performance, development of a word clustering algorithm that helps the practitioner better understand the relationships between words across categories, and design of a high-performing classifier of 88% test accuracy using deep transfer learning and state-of-the-art optimization techniques.

Paper   ·   Github

June 2022 - July 2022

WordleNet: 📱 Training a Recurrent Neural Network to Play the game "Wordle"

Recurrent Neural Nets, Deep Learning

Article (Coming soon)   ·   Github

June 2022 - June 2022

A Graph-Based Machine Learning Approach to Realistic Traffic Volume Generation 🚦 🏙

Machine Learning, Data Visualization, Statistical Analysis

In this work, we explore the potential for realistic and accurate generation of hourly traffic volume with machine learning (ML), using the ground-truth data of Manhattan road segments collected by the New York State Department of Transportation (NYSDOT). Specifically, we address the following question– can we develop a ML algorithm that generalizes the existing NYSDOT data to all road segments in Manhattan?– by introducing a supervised learning task of multi-output regression, where ML algorithms use road segment attributes to predict hourly traffic volume. We consider four ML algorithms– K-Nearest Neighbors, Decision Tree, Random Forest, and Neural Network– and hyperparameter tune by evaluating the performances of each algorithm with 10-fold cross validation. We also provide insight into the quantification of “trustworthiness” in a model, followed by brief discussions on interpreting model performance, suggesting potential project improvements, and identifying the biggest takeaways.

Paper   ·   Presentation   ·   Github

December 2021 - May 2022

AugLoss: 🌦 🔍 A Learning Methodology for Real-World Dataset Corruption

Domain Adaptation, Robust Loss Functions, Data Augmentation

Deep learning models achieve great successes in many domains, but increasingly face safety and robustness concerns, including noisy labeling in the training stage and feature distribution shifts in the testing stage. Previous works made significant progress in addressing these problems, but the focus has largely been on developing solutions for only one problem at a time. For example, recent work has argued for the use of tunable robust loss functions to mitigate label noise, and data augmentation to combat distribution shifts. As a step towards addressing both problems simultaneously, we introduce AugLoss, a simple but effective methodology that achieves robustness against both train-time noisy labeling and test-time feature distribution shifts by unifying data augmentation and robust loss functions. We conduct comprehensive experiments in varied settings of real-world dataset corruption to showcase the gains achieved by AugLoss compared to previous state-of-the-art methods.

Paper   ·   Presentation   ·   Poster   ·   Github

September 2021 - May 2022

Building a Fact-Checked Article Classifier with Naive Bayes... and more! 🧐

Natural Language Processing, Web Scraping

In this project, I develop an original machine learning (ML) algorithm that classifies the conclusions of fact-checking articles (paired with other data, such as the topic, source, etc.) as one of the following labels: false, misleading, true, or unproven. I was provided with a train set of ~6,000 examples and an unlabeled test set of ~700 examples. My objective was to train a model on the train set and predict the labels of the examples in the test set, which would be evaluated for accuracy on Kaggle. The two major obstacles in the task were the heavy imbalance of labeling (mostly 0’s), and the need for feature engineering, as very little information was provided in the datasets. As a result, I was able to extract more features by requesting the HTML pages of each article in the dataset, then develop an algorithm (I call switching) that would primarily learn on these new features.

Report   ·   Github

November 2021 - December 2021

Robustness of a Tunable Loss Function Family on Corrupted Datasets 🏞 🔒

Image Classification, Robust Loss Functions, Data Augmentation

A very important assumption in image classification is that the train and test sets are independent and identically distributed (i.i.d.); when this assumption does not hold– whether on the prior (image distribution shift) or posterior (label noise) side– deep learning models noticeably decline in their performance. In this presentation, I discuss the use of two methods– data augmentation and robust loss functions– that address the problems of test-time feature distribution shifts and train-time noisy labeling, respectively. Specifically, I demonstrate the effectiveness of AugLoss (a data augmentation technique) and alpha loss (a robust loss function) on corrupted versions of the CIFAR-10/100 datasets. Lastly, I combine the two methods and show that this combination achieves even better performance when both the test features and train labels are corrupted.

Presentation   ·   Github

May 2021 - July 2021

Full-Stack Development of a Responsive Travel Blog AND Admin Portal 🛩 🌍

Web Development, Web Design

In this project, I designed and developed a travel blog + admin interface from scratch. The travel blog is fully responsive and features a variety of user-triggered animations, such as handwritten text, photos, and moving backgrounds. The front end was implemented with HTML, CSS, and vanilla JavaScript. Additionally, the admin interface is password-protected and allows the blogger to post content (automatic image compression), manage the site layout, and send emails to subscribers; these back end features were primarily implemented with PHP. Lastly, I used SQL with PHP MyAdmin to (1) create a newsletter database that collects subscribers’ names, emails, and other useful information, and (2) create an automated traffic monitoring service that allows the blogger to evaluate the performance of the site.

Website   ·   Github

September 2020 - July 2021

Hypothetest: An interactive storyline generator for two-sample hypothesis tests ✏ 📈

Hypothesis Testing, Web Development

As a probability & statistics TA, I designed and developed a website that generates a customized write-up of the student's two-sample hypothesis test problem, including all calculation and explanation typically found in a statistics textbook. The design includes a simple but responsive layout, implemented with HTML and CSS. The functionality includes the navigation of a built-in storyboard, implemented with vanilla JavaScript. The MathJax JS library is used to generate the mathematical notation and equations, and the HighCharts JS library is used to dynamically graph the distributions.

Website   ·   Github

November 2020 - December 2020

Skills 🤹

Programming Overview 🔍
Deep Learning 🕸
  • Libraries: PyTorch, Tensorflow, NumPy, Ray, Kubeflow
  • Fields: Computer vision, Natural language processing, Signal processing
  • Tasks: Domain adaptation, Image classification, Image generation, Image deblurring, Image-to-image translation, Time-series forecasting, Text classification, Named entity recognition, Recommendation systems, Compressive sensing, Clustering, Dimensionality reduction
  • Architectures: Dense neural nets (MLPs), CNNs, Wide and deep nets, Residual nets, U-Nets, RNNs (LSTMs, Bi-LSTMs), GANs, VAEs, RBMs, Text transformers (BERT, XLNet, GPT), Vision transformers, Diffusion models
  • Datasets: MNIST, Stacked MNIST, Fashion MNIST, MNIST-M, MNIST-C, CIFAR-10/100, CIFAR-10/100-C, CIFAR-10/100-N, Tiny ImageNet, ImageNet, Celeb-A, LSUN, IMDB, MovieLens, Netflix Prize, Spotify Million Playlist
Machine Learning 🤖
  • Libraries: Scikit-Learn, NumPy, Pandas, Pyspark, SparkML, nltk, MATLAB built-ins
  • Tasks: Classification, Regression, Clustering, Association, Recommendation
  • Models: Perceptron/Linear regression, SVMs, KNN, Decision trees, Random Forest, Gradient boosting (XGBoost, LightGBM, CatBoost), Naive Bayes, Matrix factorization, Content-based/collaborative filtering, PCA, SVD, K-Means clustering, DBSCAN, Hierarchical clustering, t-SNE
  • Courses: Statistical Machine Learning, ML Security & Fairness
Data Mining ⛏
  • Python Libraries: BeautifulSoup, Selenium, requests, nltk, Pandas, Pyspark
  • Tasks: Community detection (CPM, Spectral, Modularity), Web ranking (Katz, PageRank), Association rule mining (Apriori), Data transformation (TF-IDF, Word2Vec, GloVe)
  • Courses: Data Mining, Semantic Web Mining, Social Media Mining
Data Visualization 📊
  • Python Libraries: Matplotlib, Seaborn
  • JS Libraries: D3, Leaflet, HighCharts
  • Other tools: Tableau, Power BI
Web Development 💻
  • Client side: HTML, CSS, JavaScript, Pyodide, MathJax
  • Server side: PHP, SQL, MySQL (GCP, AWS), BigQuery, Node.js, Flask
Miscellaneous Skills 🌟
  • Languages: Java, C/C++, MATLAB, Bash, Git
  • An extensive mathematical background up to the undergraduate level, including advanced calculus, linear algebra, graph theory, and scientific computing
  • Experience in academic writing, specifically with LaTeX
  • Proficiency in research computing & shared cluster environments with SLURM

Awards 🥇

  • August 2022: Awarded the 1st place prize of $7500 for winning the 2022 Wells Fargo Campus Analytics Challenge, a nationwide ML competition prompting college students to use state-of-the-art natural language processing (NLP) techniques to develop a transaction categorizer trained with their data. My submission named Building Robust & Accurate Transaction Classifiers with Deep Transfer Learning is protected via NDA, so the paper and repository will not be publicly available until August 2023.
  • June 2022: My co-authored paper AugLoss: A Learning Methodology for Real-World Dataset Corruption was one of 40 submissions accepted into the Principles of Distribution Shift workshop at the 2022 International Conference on Machine Learning (ICML). In July 2022, I presented the paper at the ICML workshop in Baltimore, Maryland.
  • April 2022: Selected to join ASU's chapter of Phi Beta Kappa, the nation's oldest and most prestigious honor society for the liberal arts and sciences. Fewer than 2% of ASU's College of Liberal Arts & Sciences (CLAS) graduates are selected to Phi Beta Kappa annually. Inducted on April 29th, 2022.
  • April 2022: Selected to attend the sixth annual Jonathan D. and Helen Wexler Mathematical Sciences Senior Dinner. Only a handful of outstanding seniors in the School of Mathematical and Statistical Sciences (SMSS) are selected annually.
  • December 2021: Selected to receive the 2021-2022 Dr. William E. Lewis Excellence in Computer Science Engineering Scholarship with the approximate amount of $6199. One student in the Fulton Schools of Engineering is selected to receive this scholarship annually.