HELLO THERE,

I'm Shree Patel

Software Engineer & Data Scientist

Shree Patel

About Me

I'm a chemist who loves coding.
The world needs more energy, more medicine, more materials, more science. I hope that my coding skills can speed up scientific discovery and help solve some of these complex issues. I have a diverse background in chemistry, coding, writing, materials, biophysics, 3D modeling, and so much more.
I am proficient in Python, C++, SQL, Bash Scripting, and R. My background combines scientific research and software engineering, and I am passionate about improving chemistry through technology, computational power, and automation. I have a Bachelor’s degree in Chemistry from Carnegie Mellon and am currently pursuing a Master’s in Molecular Science and Software Engineering at UC Berkeley.
When I am not at the computer, I like to write poetry, knit, take on new projects, and binge watch Star Wars.
I am looking for my next full-time role!

Skills

Education

University of California, Berkeley

Master of Molecular Science and Software Engineering

Expected Graduation: May 2025

CHEM 277B: Machine Learning

CHEM 281: Software Engineering for Scientific Computing

CHEM 274A: Python and C++ for Molecular Sciences

CS 267: Applications of Parallel Computing

DATA 200S: Principles and Techniques of Data Science

CHEM 274B: Software Engineering Fundamentals

Carnegie Mellon University

Bachelor of Science in Chemistry with University and College Honors

Minors: Creative Writing and Engineering Studies

Graduated: May 2023

09-563: Molecular Modeling and Computational Chemistry

19-443: R for Data Science, Technology, and Public Policy

15-110: Principles of Computing

09-231: Mathematical Methods for Chemists

09-322: Molecular Spectroscopy and Design

27-212: Defects in Materials

Experience

AI & Analytics Intern - Digital Endpoints and Patient-Centered Solutions

Genentech

January 2025 - Present

In this role, I was responsible for piloting MVP's to enable data science and AI democratization among real-world data and regulatory stakeholders

  • Piloting MVPs through Agile SDLC lifecycle and transitioned validated prototypes to platform teams for scalable deployment
  • Leading discovery sessions with Regulatory and RWD teams to define needs, map data flows, and blueprint GenAI solutions
  • Partnering with regulatory scientists to convert health authority feedback into features for AI-driven clinical protocol evaluations
  • Deploying AI-driven information organization workflow to identify trends in health authority questions of clinical trial protocols
  • Built LLM-based automation workflows to transform unstructured clinical trial data into modeling pipelines for decision-making
  • Conducted large-scale data analysis on 20M+ medical records using bioBERT-based clustering to surface clinical decision signals
  • Developed a 4-step AI-driven workflow to extract and organize information related to treatment adherence from doctor’s notes
  • Demonstrated a 10% prediction improvement over traditional ML methods in identifying trends and correlations in medical data

Team Lead - Open Innovation Squad

UC Berkeley

January 2025 - June 2025

In this role, I was responsible for leading a cross-functional team to deliver a comprehensive competitor, feature, and market analysis for an agentic AI workflow automation and data analysis product

  • Conducted competitive benchmarking and user research to surface user needs, pain points, and guide product differentiation
  • Facilitated iterative design thinking cycles and incorporated customer feedback to boost non-technical user engagement by 15%
  • Defined KPIs and delivery timelines, aligning internal and client teams to gain 4% monthly revenue and 7% satisfaction growth
  • Aligned communication across client, internal team, and Open Innovation Board to maintain strategic and executional alignment
  • Defining low-fidelity prototypes and feature sets to enhance AI functionalities, aligning with user needs and business objectives
  • Analyzing AI and data regulations in deployment markets to comply with international legislation such as GDPR and EU AI Act

Laboratory Development Engineer

Emerald Cloud Lab

June 2023 - June 2024

In this role, I was responsible for designing, building, and deploying solutions to automate laboratory workflow

  • Led the design and deployment of customized magnetic tube rack system to minimize dead volume and improve lab efficiency
  • Cut material resource usage by 25% by iterating on magnetic rack to enhance ergonomic design and space utilization
  • Reengineered experiment database on AWS to streamline magnetic rack data handling and change operator task prioritization
  • Designed and installed customized IR temperature sensor holders to standardize positioning and data collection
  • Led a team of 8 temporary workers to assemble and integrate various sensor arrays, accelerating speed on build out by 15%
  • Updated deprecated solvent bottle location objects using Wolfram Mathematica to increase speed for storage tasks by 10%

Undergraduate Research Assistant

Kurnikova Group (CMU)

March 2022 - September 2023

I co-authored a paper detailing a machine learning workflow to reduce computational costs for protein-ligand binding by 85%

  • Simulated complex alchemical thermodynamic cycles with molecular dynamics for RBFE / ABFE calculations on T4 lysozyme
  • Scripted selection algorithms in Python to implement machine learning ligand orientation predictions for binding simulations
  • Analyzed the electrostatic potential of the WD40 domain of LRRK2, identifying 4 target regions for future drug development
  • Developed force field parameters for ligands using GAFF2 and derived atomic charges via the RESP method with Gaussian
  • Utilized GPU clusters to perform high performance computing for expensive protein-ligand simulations

Awards:

  • 1st Place Capstone Presentation -- highest score in the last 5 years

Manufacturing Science and Technology Intern

Merck & Co.

May 2022 - August 2022

I used statistical and analytical techniques to double the speed of tablet stability assessments

  • Designed tablet stability experiment across temperature and humidity conditions to collect high-quality degradation data
  • Conducted stability testing with analytical chemistry techniques and solution preparation methods with LC-MS
  • Modeled laboratory data with Ridge and Lasso linear regressions using RMSE in R to assess the accuracy of detection protocol
  • Developed Bayesian machine learning model to double the speed of stability assessments and degradation forecasts

Undergraduate Research Assistant

Jin Laboratory

Dec 2020 - August 2022

I designed synthetic pathways for novel small thiolated gold nanoclusters

  • Utilized UV-Visible Spectroscopy and Photoluminescence Spectroscopy to analyze properties related to electronic structure
  • Conducted experiments with techniques like aqueous-organic separation, ligand exchange, and thiol etching
  • Probed vibrational levels in nanoparticle metallic core with THz Raman Spectroscopy and cryo-optical methods

Awards:

  • 1st Place in Chemistry -- CMU Sigma Xi Quantitative Research Competition
  • 2nd Place Overall -- CMU Sigma Xi Quantitative Research Competition

Poster Presentations:

  • Synthesis and Characterization of Thiolate-Protected Gold Nanoclusters of Atomic Precision
  • Presented at: ACS Fall 2022, ACS Regional Symposium Spring 2021 and CMU Meeting of the Minds

Technical Operations Chemistry Intern

Merck & Co.

May 2021 - August 2021

I employed my chemical expertise to resolve antibiotic manufacturing process deviations

  • Conducted 50+ flow experiments on column chromatography set-up to confirm raw material functionality
  • Documented experimental results in GMP format to support process change requests for raw materials specifications
  • Performed weekly safety inspections, housekeeping walkthroughs, and flow tests on safety equipment

Projects

Click on a title to view the project code and/or publication!

LLM-Powered Information Extraction from Patient EHR Data

I created a 4-step, LLM-powered, low-code workflow that helps users quickly explore their data, reducing analysis time from days to hours. We aimed to extract treatment adherence insights from free-text doctor's notes and use them to train a machine learning model predicting visual acuity improvement after one year. I worked with product leaders and data scientists to refine the approach, which involves breaking down user queries into key factors, scoring each doctor's note accordingly, and combining the results with demographic data in a tabular format. This enabled us to identify serious comorbidities and patient independence as key predictors of treatment success.

Tech Stack / Techniques:

Compensation Summation, Cheminformatics Graph Walking, and Substructure Searching

For my final project in the MSSE Chem 274A course, I developed a computational toolset to explore molecular systems using advanced techniques in computational chemistry. The project involved implementing methods such as compensation summation, graph walking and adjacency matrices for molecule, and substructure searches with hashing functional groups. The project involved developing a framework for reproducible and scalable scientific computations which can be easily accessed through a user interface. This work highlights my ability to apply interdisciplinary skills to solve complex problems in the molecular science.

Tech Stack / Techniques:

Leveraging Computer Vision to Efficiently Allocate Emergency Resources

I co-developed a logistic regression model and a convolutional neural network (CNN) using scikit-learn and TensorFlow to classify images with 90% accuracy. To handle a dataset of 20,000 images, I applied computer vision techniques (Sobel Edge filtering, etc.) for feature extraction and to scale the model implementation. I further improved the damage detection accuracy by 30% by using Principal Component Analysis (PCA), hyperparameter tuning, 5-fold cross-validation, and gradient descent optimization.

Tech Stack / Techniques:

Predicting Regenerative Properties at the Genomic Level with Machine Learning

I automated extracting trimer counts and calculating the AT/GC ratio from over 100 genes using Pandas, NumPy, and Biopython. This, combined with the NCBI Command Line Tool, streamlined the data processing workflow. I designed a convolutional neural network (CNN) with a Long Short-Term Memory (LSTM) layer using TensorFlow to classify regenerative genes from the NCBI database, achieving 75% accuracy. I visualized the trimers based on their regenerative importance, mapping prevalent amino acids and validating the results against existing literature. Additionally, I developed an API to access UniProt protein structures for known regenerative proteins and automated comparisons with AlphaFold.

Tech Stack / Techniques:

Collaborative Modular Banking Software for Transaction Management

I co-designed and implemented a banking system using Python to streamline account management and transaction processing. To improve efficiency, I refactored transactions, spending aggregation, and balance retrieval by utilizing dictionary lookups and binary search to reduce runtime. I also debugged account histories for joint accounts, creating a unified transaction log through efficient data aggregation and handling, ensuring seamless access and accurate transaction tracking across multiple users.

Tech Stack / Techniques:

Automated Resource Allocation for Protein-Ligand Binding Simulations

I collected data by conducting both RBFE and ABFE calculations by simulating an alchemical thermodynamic cycle for our benchmark system and our ligands as well. I also processed the data and helped guide the development of our automation framework. We submitted a manuscript detailing the automated framework. Our resource allocation algorithm saved up to 85% of computational cost for high-throughput protein-ligand binding free energy simulations.

Tech Stack / Techniques:

Bayesian Machine Learning Model Development for Tablet Stability Experiments

I developed a Bayesian machine learning model that reduced analytical data collection time by 50% for high-throughput tablet stability experiment design. I collaborated with the LC-MS team to create a more sensitive method for quantifying trace degradation levels and conducted a short-term stability study using this approach. With my model, I accurately predicted degradation patterns for up to two years with just three months of data, cutting the previous requirement of six months in half.

Tech Stack / Techniques:

Modeling the Impact of Electric Vehicles (eV) on Greenhouse Gas Emissions

In this project, I aimed to gain insights into EV purchase trends and their environmental implications. I utilized GeoPlot to visualize the locations of purchases over a map of New York State, allowing me to identify trends based on population density and socioeconomic factors. Additionally, I employed linear regression to investigate the correlation between EV sales and GHG emissions, while applying elastic net regressions to enhance model accuracy by incorporating penalty terms for coefficients.

Tech Stack / Techniques: