About me

I’m undergraduate student at University of Illinois Urbana-Champaign studying statistics and computer science. I won the Dean’s List Honorary designation for every semester since freshman.

I’m a research assistant at Data Mining Group supervised by professor Jiawei Han. My primary research interests involve data mining, more specifically topic discovery and entity set expansion in the natural language processing field. Natural Language Processing arises in many contexts to facilitate tasks of analyzing large amounts of documents and understanding the context of natural languages. Besides, I’m also interested in computer vision and robotics.


Unless otherwise specified, the paper is accepted/submitted as a research track long/regular paper. “ ∗ ” indicates equal contribution.

  • Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts
    • Yu Zhang$^∗$, Yunyi Zhang$^∗$, Yucheng Jiang, Martin Michalski, Yu Deng, Lucian Popa, ChengXiang Zhai, Jiawei Han
    • (Accepted by) IEEE BigData 2022 Workshop on Knowledge Discovery and Data Mining in IT Operations (BigData-IT)(short)
  • Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts
    • Yu Zhang$^∗$, Yunyi Zhang$^∗$, Martin Michalski$^∗$, Yucheng Jiang$^∗$, Yu Meng$^∗$, Jiawei Han
    • (Accpted by) The 16th ACM International Conference on Web Search and Data Mining (WSDM 2023)
  • An Analytic Comparison of Student-Scheduled and Instructor-Scheduled Collaborative Learning in Online Contexts [Download]
    • Geoffrey Herman$^∗$ , Yucheng Jiang$^∗$ , Yueqi Jiang, Seth Poulsen, Matthew West, Mariana Silva
    • (Accpted by) The 2022 Conference of the American Soceity for Engineering Education (ASEE 2022)
  • Seed-Guided Hierarchical Topic Discovery by Exploring Long-Range Contexts
    • Yucheng Jiang$^∗$, Yu Zhang$^∗$, Yunyi Zhang$^∗$, Jiawei Han
    • (Submitted to) The 32nd International World Wide Web Conference (WWW 2023)

Research project

Long-range Context Hierarchical Seed-Guided Topic Discovery

  • Research assistant, Data Mining Group, supervised by Prof. Jiawei Han (Aug. 2022 - Present)
  • Lead a team of three students extend topic mining framework into hierarchy structure to facilitate hierarchical taxonomy construction and document classification
  • Designed framework to leverage benefit of pre-train language model and retrieve topic implicative document segments to perform topic discovery in a broader context

Entity Set Co-Expansion in StackOverflow

  • Research assistant, Data Mining Group, supervised by Prof. Jiawei Han (Aug. 2022 - Present)
  • Design entity set expansion model to simultaneously expand multiple types of seed by using mutual exclusivity among entity sets to determine expansion boundary
  • Leverage generic natural language knowledge from pre-train language model to facilitate learning seed representation based on themselves and their context simultaneously

Seed-Guided Topic Mining

  • Research assistant, Data Mining Group, supervised by Prof. Jiawei Han (Nov. 2021 - Aug. 2022)
  • Designed iterative topic mining framework with text embedding and pretrain language model based representations
  • Improved weakly hierarchical multi-label text classification model to test on 100 million conference papers
  • Achieved 13% accuracy improvement compared to baseline models

POGIL Groupwork Analytics

  • Research assistant, Supervised by Prof. Geoffrey Herman, Prof. Mariana Silva (Dec. 2020 - May 2022)
  • Retrieved and integrated over 2T online learning platform data through API and web crawling
  • Established SQL database and data storage schema for research team
  • Designed behavioral pattern mining model to extract learning patterns from click stream log activity data from an online open learning platform
  • Performed quasi-experimental study on collaboration efficiency, quality, and equality using Multi Level Modeling

Professional experience

Apple Intern

  • Software Engineer Intern, supervised by Brian Smith (May 2022 - Aug. 2022)
  • Constructed software to support data analysis for the wifi positioning system and cellular positioning system
  • Proposed and implemented new algorithms for localization with wifi and cellular signals, improve accuracy by over 50%
  • Coded testing and infrastructure to support wifi and cellular positioning systems

Code Review Moderator

  • Software Design Studio code review moderator, supervised by Prof. Michael Woodley (Dec. 2021 - Present)
  • Organized weekly code review session with 22 junior students and gave feedback on algorithm analysis, code style, and OOP design strategy
  • Built course infrastructure and maintained algorithm to assign more than 450 students to over 120 code review sessions based on students’ time availability and moderators’ skill set
  • Detected plagiarism in weekly machine project submissions based on historical submission pool


  • CS 222 Software Design Lab course assistant (Aug. 2022 - Present)
  • CS 225 Data Structure course assistant and project mentor (Aug. 2022 - Present)
  • CS 374 Intro to Algorithm \& Model of Computation course assistant (Aug. 2022 - Present)
  • CS 126 Software Design Studio course assistant (Jan. 2022 - May 2022)
  • CS 128 Introduction to Computer Science II course developer ( Jan. 2021 - May 2021)
  • CS 125 Introduction to Computer Science course assistant (Jan. 2020 - Dec. 2020)


  • University of Illinois Urbana-Champaign CRA Undergraduate Research Award (Oct. 2022)
  • University of Illinois Urbana-Champaign Statistics Competition Second Place (Sep. 2021)
  • Software Design Studio course algorithm competition First Place (Mar. 2020)