CV
Jiawei Zhang
Data Scientist & Ph.D. Researcher
Summary
Data Science Ph.D. student focusing on LLMs and causal inference while maintaining scikit-learn.
Education
- Data SciencePresentUniversity of ChicagoCourses: Large language models, Causal inference, Responsible machine learning, Hierarchical reasoning for AGI
- Mathematics & Computer Science2024-05New York UniversityCourses: Machine learning, Statistical modeling, Data engineering, Optimization
Work Experience
- Research Scientist Intern2025-06 - PresentSapient IntelligenceAdvancing the Hierarchical Reasoning Model for self-evolving AGI and integrating visual autoregressive modeling to improve adaptive planning.
- Designed reasoning pipelines that couple HRM with spatiotemporal world models.
- Explored VAR-based modules that boost predictive reasoning and planning stability.
- Open Source Developer2023-03 - Presentscikit-learnMaintain and extend core machine learning components, shape release planning, and support new contributors.
- Created BayesSearchCV and delivered Gaussian Mixture and preprocessing improvements adopted in releases.
- Served as a triage team member, guiding contributor onboarding, testing, and review workflows.
- Open Source Contributor2023-03 - PresentPandasImproved reliability and metadata propagation in the core pandas data analysis library.
- Resolved multi-year metadata propagation regressions impacting production workloads.
- Delivered fixes now included in the mainline library.
- Research Assistant2023-01 - 2024-09New York UniversityCo-developed Imputation-Assisted Randomization Tests for causal inference with missing outcomes.
- Authored theoretical guarantees and algorithms for covariate-adjusted randomization tests.
- Released open-source Python and R implementations adopted by collaborating research teams.
- Research Assistant2022-08 - PresentNew York UniversityModeled heterogeneous treatment effects at scale to inform individualized clinical decisions.
- Analyzed 700k+ study cases to estimate personalized risks for bariatric surgery candidates.
- Constructed confidence intervals that supported updates to ASMBS guidelines.
- AI Intern2021-05 - 2021-11Tencent Music Entertainment GroupBuilt speech and language prototypes for consumer music products.
- Developed ASR and TTS models to improve lyric transcription accuracy.
- Implemented smart lyrics features that integrated with large-scale music platforms.
Skills
Data Science
- Python
- scikit-learn
- PyTorch
- SQL
- Large language models
- Causal inference
- Machine learning systems
- ASR/TTS
- Bayesian optimization
- GFlowNets
Workflow & Tooling
- Experiment design
- MLOps
- GPU acceleration
- Reproducible research
- Open-source governance
Publications
- From Division to Unity: A Large-Scale Study on the Emergence of Computational Social Science, 1990–20212025The Web Conference (WWW) 2025Analyzes three decades of publication data to uncover convergence dynamics in computational social science (co-first author).
- Design-based causal inference with missing outcomes: missingness mechanisms, imputation-assisted randomization tests, and covariate adjustment2025Journal of the American Statistical Association (Theory and Methods)Introduces imputation-assisted randomization tests and covariate adjustment techniques for causal inference with missing outcomes (co-first author).
Teaching
- DATA 22100 Introduction to Machine Learning: Concepts and Applications2025University of ChicagoRole: Teaching AssistantLed labs and discussion sessions covering supervised/unsupervised learning, model evaluation, and responsible ML.
- GPH-GU 2363 Causal Inference: Design and Analysis2023New York UniversityRole: Teaching AssistantSupported course delivery on design-based causal inference, randomized experiments, and sensitivity analyses.
Interests
- Research interestsLLM reinforcement learning, interpretability, alignment, Causal inference