Vyom Pathak

I am currently working as a full-time applied scientist at Chronograph in New York, and as a part-time research assistant at the Data Science Research Lab at the University of Florida. At Chronograph, I am working on applied NLP problems in financial domain by building large foundation models. At UF, I am working on improving faithfulness in multimodal reasoning via large language models using human feedback for the DARPA funded ECOLE project. I am working under the guidance of Dr. Daisy Wang from UF, Dr. Eric Xing from CMU, and Dr. Zhiting Hu from UCSD.

I graduated from University of Florida in May 2023 with a master's degree in computer science. At UF, I worked at the junction of deep learning, and natural language processing on large scale data at the Data Science Research Lab under the guidance of Dr. Daisy Wang as a research assistant.

Last year, I interned at Amazon as an applied scientist in the Alexa Smart Home team under the guidance of Sven Eberhardt and Zeya Chen on developing novel transformer architecture for sequence-to-sequence problems. Before that, I interned at Apple as a machine learning researcher in the Siri Text-to-Speech team under the guidance of Kishore Prahallad and Eoin Murphy, where I worked on developing end-to-end acoustic model on large scale speech data.

Before master's, I interned at the Indian Space Research Organization (ISRO) as a machine learning researcher under the guidance of Dr. Arvind Sahay for developing neural network based solution for oceanographic problems. I received my bachelor's degree from Dharmsinh Desai University in computer engineering in 2021. At DDU, under the guidance of Dr. Brijesh Bhatt, I worked on automatic speech recognition system for low-resource language (Gujarati).

I am very much open to different research domains as well as cool/hard machine learning problems. You can contact me through any of the following mediums for collaborations.

Email  /  CV  /  LinkedIn  /  Github  /  Google Scholar  /  Kaggle  /  Medium

Seeking full-time Research Engineer opportunities!

profile photo

News
  • [09/2023] Started working as an Applied Scientist at Chronograph.
  • [06/2023] Started working as a Research Assistant at the Data Science Research Lab under the guidance of Dr. Daisy Wang in collaboration with CMU, and UCSD.
  • [05/2023] Graduated from University of Florida with a Master of Science in Computer Science.
  • [12/2022] Started working as a Research Assistant at the Data Science Research Lab under the guidance of Dr. Daisy Wang.[08/2022] Started my fall internship as an Applied Scientist at Amazon Alexa Smart Home.
  • [05/2022] Started my summer internship as a Machine Learning Researcher at Apple Siri Text-to-Speech.
  • [05/2022] Our paper titled Improving Deep Learning based Automatic Speech Recognition for Gujarati got accepted in ACM TALLIP 2022.
  • [12/2021] Our paper titled Neural Network Based Retrieval of Inherent Optical Properties (IOPs) Of Coastal Waters of Oceans got accepted in IEEE InGARSS 2021.
  • [08/2021] Started my master's at the University of Florida majoring in Computer Science.
  • [09/2021] Started working as a Research Assistant at the Data Science Research Lab under the guidance of Dr. Daisy Wang.
  • [07/2021] Won Silver Medal (Top 3%) in the CommonLit Readability Prize Prediction NLP competition.
  • [05/2021] Graduated as a First Class student, from Dharmsing Desai University with a Bachelor of Technology in Computer Engineering.
  • [12/2020] Our paper titled End-to-End Automatic Speech Recognition for Gujarati got accepted in ACL ICON 2020.
  • [12/2020] Started my spring internship as a Machine Learning Researcher at Indian Space Research Organization (ISRO).
  • [08/2020] Appointed as the Head of the Machine Learning Team in the Developers Student Clubs powered by Google.
  • [08/2019] Appointed as the Machine Learning member in the Developers Student Clubs powered by Google.
  • [07/2017] Started my Bachelor in Technology in Computer Engineering from Dharmsinh Desai University.

Research

Much of my research is at the intersection of speech related tasks, finetuning large language models, and the application of deep learning in various domains. Sparingly, I have worked on speech enhancement, automatic speech recognition, natural language understanding, prompting large language models, multimodal reasoning, dialogue managers, text-to-speech, conversational agents, and time series modeling.

improving-asr-guj Improving Deep Learning based Automatic Speech Recognition for Gujarati
Deepang Raval*, Vyom Pathak*, Muktan Patel*, Brijesh Bhatt
ACM Transactions on Asian and Low-Resource Language Information Processing
(ACM TALLIP), May 2022
[Paper] / [Code]

We build upon our previous work and develop higher-order character-level and word-level language models for better constraining the beam search decoding. We also provide an ablation on how to choose a good size of the language model for beam search decoding.

mnn-isro Neural Network Based Retrieval of Inherent Optical Properties (IOPs) Of Coastal Waters of Oceans
Vyom Pathak, Brijesh Bhatt, Arvind Sahay, Mini Raman
International India Geoscience and Remote Sensing Symposium
(InGARSS 2021) (IEEE), December 2021
[Paper] / [Code] / [Slides]

We introduced a modified neural network method of deriving Inherent Optical Properties by using the remote sensing reflectance wavelengths. The proposed outperforms all other previous methods in the retrieval of each IOPs for the Insitu dataset.

e2e-asr-guj End-to-End Automatic Speech Recognition for Gujarati
Deepang Raval*, Vyom Pathak*, Muktan Patel*, Brijesh Bhatt
17th International Conference on Natural Language Processing
(ICON 2020) (ACL Anthology), December 2020
[Paper] / [Code] / [Oral Talk]

We invented a two-tier post-processing technique to correct errors in an end-to-end ASR system for low-resource language (Gujarati). Firstly, we develop a novel multi-level (character and word level) language model for constraining the beam search decoding on the ASR output. On top of that, we developed a novel Spell Correction model based on multi-lingual BERT and a word-level language model. Finally, we provide a new and extensive analysis method to understand the errors made by the ASR system for the Gujarati Language.


Projects

Along the journey of understanding machine learning, I have worked on several projects ranging from large language modeling, chat bots, protein structure degradation, kaggle competitions, and digital image processing.

Preliminary Survey on Large Language Models Preliminary Survey on Foundation Language Models
Vyom Pathak
[Code] / [Slides]

Language models are essential components of natural language processing that can capture general representations of language from large-scale text corpora. In recent years, various language models have been proposed, ranging from shallow word embeddings to deep contextual encoders, with different pre-training tasks, training frameworks, adaptation methods, and evaluation benchmarks. The year 2022 saw a surge in the development of large generative models, referred to as "foundation models," that can perform multiple tasks by training on a general unlabeled dataset. However, there is a need for a comprehensive survey that can link these models and contrast their advantages and disadvantages. In this paper, we present a systematic survey of foundation language models that aims to connect various models based on multiple criteria, such as representation learning, model size, task capabilities, research questions, and practical task capabilities.

mrna-degrad mRNA Vaccine Degradation Prediction
Vyom Pathak*, Rahul Roy*
[Code] / [Slides]

We compared 2 methods to predict the degradation of the mRNA Vaccine protein degradation. We investigate graph-based as well as sequence-to-sequence models for the RNA structure on the Eterna Dataset. Our results show that the graph-based solution better represents the dataset thus giving better performance than the sequence-to-sequence model.

mnn-isro Image based Melanoma Classification using Semi-Supervised learning
Vyom Pathak*, Sanjana Rao*
[Code] / [Slides]

We present an ensemble of image-only convolutional neural network (CNN) models with different backbones and input sizes along with a self-supervised model to classify skin lesions. We use the Bootstrap your own latent (BYOL) model for self-supervision on the state-of-the-art image recognition models for performance improvements. With this, we show an improvement of 2% over the baselines.

zs-bot Zero Shot Schema based Dialogue System
Vyom Pathak*, Amogh Mannekote*, Oluwapemisin Bandy-toyo*
[Code] / [Slides]

We have developed a Schema-based Dialogue System for Zero-Shot Task Transfer which enables designers to quickly and easily transfer the learned behaviors from related tasks/domains to new domains or tasks. We used zero-shot prompt-based Dialogue-GPT2 for the response generation part. We verified the viability of the system as a wire-framing technique through two user studies.

common-lit-nlp Common Literature Readibility Prediction
Vyom Pathak*, Muktan Patel*, Deepang Raval
[Post 1] / [Post 2]

We participated in a Kaggle competition to predict the difficulty of a comprehensive paragraph as a regression score. We developed a novel 2D-Attention mechanism to boost our performance, along with finetuning several large language models. The final model was an ensemble based on the Forward OOF method which helped us achieve a Silver medal in the competition.

guide-bot Guide Bot
Vyom Pathak*, Muktan Patel*, Deepang Raval*, Utsav Parmar*, Sachin Thakkar*
[Video] / [Slides]

For the 'Build for Digital India' program, we decided to build a guide-bot for the Regional Transport Office (RTO), Government of India under the theme Smart cities & Infrastructure. A guide-bot that can answer and help people with relevant kinds of queries and confusions usually they have. So the work of people will become easier and correct information can be provided.

ML Projects & Paper Reviews ML Algorithms, and paper reviews
Vyom Pathak

A repository for ML algorithms I implemented ranging from regression models, decision trees, CNN, and NLP concepts; to transfer learning. This repository also houses different papers that I have reviewed along with my thoughts on the same. I update this project with my new learnings.

pixby Pixby - Photo Editing Windows application
Vyom Pathak
[Video] / [Code]

This application is used for editing photos. It uses digital image processing based on matrix manipulations for photo editing. needs. The software provides functionalities like image RGB value modification, image styling, image flipping along the axis, image rotation as well as applying some filters on the images like Sepia, Greyscaling, Inverting, etc.


Blog Posts

I write about my research and other interesting things including my professional journey, and other helpful guides on the medium.

wavey-ai End-To-End Speech Recognition - 3 Part Series
Vyom Pathak*, Deepang Raval*, Muktan Patel*
[Part 1: Understanding Sound]
[Part 2: Simple Audio Feature Extraction for Machine Learning]
[Part 3: Understanding Different Approaches]

We published three blogs about understanding what is Speech Recognition, feature extraction for speech recognition, to approaches to solving the problem using machine learning in the respective blog posts.

guide-ml Guide on starting Machine Learning
Vyom Pathak

This is a rough guide on starting one’s journey in machine learning, from foundational statistics to advanced deep learning concepts.

journey-1 Professional Journey
Vyom Pathak

My story of landing machine learning internships at Apple and Amazon. A detailed outlook on my professional journey from June 2017 to February 2022.


Miscellaneous
Open Source
Contribution

I have made contributions in the form of model implementation, bug fixes, document fixes, type fixing, function deprecation, and adding a testing suite. Some of my contributions are in the following open-source projects: PyTorch Ignite, Hugging Face, TensorFlow, Keras, Scikit-learn, Optuna, Pandas, Julia, and Python. You can find more information on my github README.

Awards &
Honors
- Program Chair for the 29th International Conference on Neural Information Processing, (ICONIP 2022)
- Silver Medal (Top 3%) in CommonLit Readability Prize Challenge
- Secured AIR 446, and Global rank 2124 in Hash Code 2021
- Appointed as the Machine Learning Team Head at DSC by Google, 2020.
- Appointed as a Machine Learning Team member at DSC by Google, 2019
- AIR 1987, College 3rd in ACM-ICPC Online coding round 2019

*These authors contributed equally to this work.

Big thank you to Jon Barron for the website template! Last updated: 1st Sept 2023