CV

Jiawei Zhang

Data Scientist & Ph.D. Researcher

jiaweizhang@uchicago.edu
+1 (718) 916-5636
Chicago, IL, US

Summary

Data Science Ph.D. student focusing on LLMs and causal inference while maintaining scikit-learn.

Education

  • Data Science
    Present
    University of Chicago
    Courses: Large language models, Causal inference, Responsible machine learning, Hierarchical reasoning for AGI
  • Mathematics & Computer Science
    2024-05
    New York University
    Courses: Machine learning, Statistical modeling, Data engineering, Optimization

Work Experience

  • Research Scientist Intern
    2025-06 - Present
    Sapient Intelligence
    Advancing the Hierarchical Reasoning Model for self-evolving AGI and integrating visual autoregressive modeling to improve adaptive planning.
    • Designed reasoning pipelines that couple HRM with spatiotemporal world models.
    • Explored VAR-based modules that boost predictive reasoning and planning stability.
  • Open Source Developer
    2023-03 - Present
    scikit-learn
    Maintain and extend core machine learning components, shape release planning, and support new contributors.
    • Created BayesSearchCV and delivered Gaussian Mixture and preprocessing improvements adopted in releases.
    • Served as a triage team member, guiding contributor onboarding, testing, and review workflows.
  • Open Source Contributor
    2023-03 - Present
    Pandas
    Improved reliability and metadata propagation in the core pandas data analysis library.
    • Resolved multi-year metadata propagation regressions impacting production workloads.
    • Delivered fixes now included in the mainline library.
  • Research Assistant
    2023-01 - 2024-09
    New York University
    Co-developed Imputation-Assisted Randomization Tests for causal inference with missing outcomes.
    • Authored theoretical guarantees and algorithms for covariate-adjusted randomization tests.
    • Released open-source Python and R implementations adopted by collaborating research teams.
  • Research Assistant
    2022-08 - Present
    New York University
    Modeled heterogeneous treatment effects at scale to inform individualized clinical decisions.
    • Analyzed 700k+ study cases to estimate personalized risks for bariatric surgery candidates.
    • Constructed confidence intervals that supported updates to ASMBS guidelines.
  • AI Intern
    2021-05 - 2021-11
    Tencent Music Entertainment Group
    Built speech and language prototypes for consumer music products.
    • Developed ASR and TTS models to improve lyric transcription accuracy.
    • Implemented smart lyrics features that integrated with large-scale music platforms.

Skills

Data Science

  • Python
  • scikit-learn
  • PyTorch
  • SQL
  • Large language models
  • Causal inference
  • Machine learning systems
  • ASR/TTS
  • Bayesian optimization
  • GFlowNets

Workflow & Tooling

  • Experiment design
  • MLOps
  • GPU acceleration
  • Reproducible research
  • Open-source governance

Publications

  • From Division to Unity: A Large-Scale Study on the Emergence of Computational Social Science, 1990–2021
    2025
    The Web Conference (WWW) 2025
    Analyzes three decades of publication data to uncover convergence dynamics in computational social science (co-first author).
  • Design-based causal inference with missing outcomes: missingness mechanisms, imputation-assisted randomization tests, and covariate adjustment
    2025
    Journal of the American Statistical Association (Theory and Methods)
    Introduces imputation-assisted randomization tests and covariate adjustment techniques for causal inference with missing outcomes (co-first author).

Teaching

  • DATA 22100 Introduction to Machine Learning: Concepts and Applications
    2025
    University of Chicago
    Role: Teaching Assistant
    Led labs and discussion sessions covering supervised/unsupervised learning, model evaluation, and responsible ML.
  • GPH-GU 2363 Causal Inference: Design and Analysis
    2023
    New York University
    Role: Teaching Assistant
    Supported course delivery on design-based causal inference, randomized experiments, and sensitivity analyses.

Interests

  • Research interests
    LLM reinforcement learning, interpretability, alignment, Causal inference