CV

Jiawei Zhang

Data Scientist & Ph.D. Researcher

jiaweizhang@uchicago.edu

+1 (718) 916-5636

https://jiawei-zhang-a.github.io

Chicago, IL, US

GitHub LinkedIn

Summary

Data Science Ph.D. student focusing on LLMs and causal inference while maintaining scikit-learn.

Education

Data Science
Present
University of Chicago
Courses: Large language models, Causal inference, Responsible machine learning, Hierarchical reasoning for AGI
Mathematics & Computer Science
2024-05
New York University
Courses: Machine learning, Statistical modeling, Data engineering, Optimization

Work Experience

Research Scientist Intern
2025-06 - Present
Sapient Intelligence
Advancing the Hierarchical Reasoning Model for self-evolving AGI and integrating visual autoregressive modeling to improve adaptive planning.
- Designed reasoning pipelines that couple HRM with spatiotemporal world models.
- Explored VAR-based modules that boost predictive reasoning and planning stability.
Open Source Developer
2023-03 - Present
scikit-learn
Maintain and extend core machine learning components, shape release planning, and support new contributors.
- Created BayesSearchCV and delivered Gaussian Mixture and preprocessing improvements adopted in releases.
- Served as a triage team member, guiding contributor onboarding, testing, and review workflows.
Open Source Contributor
2023-03 - Present
Pandas
Improved reliability and metadata propagation in the core pandas data analysis library.
- Resolved multi-year metadata propagation regressions impacting production workloads.
- Delivered fixes now included in the mainline library.
Research Assistant
2023-01 - 2024-09
New York University
Co-developed Imputation-Assisted Randomization Tests for causal inference with missing outcomes.
- Authored theoretical guarantees and algorithms for covariate-adjusted randomization tests.
- Released open-source Python and R implementations adopted by collaborating research teams.
Research Assistant
2022-08 - Present
New York University
Modeled heterogeneous treatment effects at scale to inform individualized clinical decisions.
- Analyzed 700k+ study cases to estimate personalized risks for bariatric surgery candidates.
- Constructed confidence intervals that supported updates to ASMBS guidelines.
AI Intern
2021-05 - 2021-11
Tencent Music Entertainment Group
Built speech and language prototypes for consumer music products.
- Developed ASR and TTS models to improve lyric transcription accuracy.
- Implemented smart lyrics features that integrated with large-scale music platforms.

Skills

Data Science

Python
scikit-learn
PyTorch
SQL
Large language models
Causal inference
Machine learning systems
ASR/TTS
Bayesian optimization
GFlowNets

Workflow & Tooling

Experiment design
MLOps
GPU acceleration
Reproducible research
Open-source governance

Publications

From Division to Unity: A Large-Scale Study on the Emergence of Computational Social Science, 1990–2021
2025
The Web Conference (WWW) 2025
Analyzes three decades of publication data to uncover convergence dynamics in computational social science (co-first author).
View Publication
Design-based causal inference with missing outcomes: missingness mechanisms, imputation-assisted randomization tests, and covariate adjustment
2025
Journal of the American Statistical Association (Theory and Methods)
Introduces imputation-assisted randomization tests and covariate adjustment techniques for causal inference with missing outcomes (co-first author).
View Publication

Teaching

DATA 22100 Introduction to Machine Learning: Concepts and Applications
2025
University of Chicago
Role: Teaching Assistant
Led labs and discussion sessions covering supervised/unsupervised learning, model evaluation, and responsible ML.
GPH-GU 2363 Causal Inference: Design and Analysis
2023
New York University
Role: Teaching Assistant
Supported course delivery on design-based causal inference, randomized experiments, and sensitivity analyses.

Interests

Research interests
LLM reinforcement learning, interpretability, alignment, Causal inference

Download CV as PDF View Markdown CV