Zaid Khan
I’m an 1st-year PhD student at Mohit Bansal’s group (MURGe Lab) at UNC Chapel Hill. I’m also a longtime student researcher in the Media Analytics Group at NEC Laboratories America with Manmohan Chandraker (from 2022). I completed my BS+MS at Northeastern, where I worked with Raymond Fu.
what I’m working on right now
- automatic skill-based data / environment generation: DataEnvGym frames data generation as an RL-style sequential decision-making problem. The goal is to build agents which can automate the process of identifying the weak skills of a model and generating training data to improve those weak skills. It builds on EnvGen, which generates training environments that help an agent learn skills the agent is weak at.
- LLM-driven exploration and planning: working with Tanmay Gupta at Ai2’s PRIOR team to build agents which can navigate complex environments and plan long-horizon behaviors within those environments. I’m especially interested in environments where classical search methods fail because of how large the state-action space is, such as a large code repository or complex procedural environments.
what I’ve worked on in the past
- Self-training / self-improvement
- using reinforced self-training to improve program synthesis (Khan et al., CVPR 2024)
- using unlabeled data to improve vision-language reasoning (Khan et al., CVPR 2023)
- Using uncertainty during reasoning and decision-making
- using self-consistency to identify unreliable knowledge (Khan et al., CVPR 2024)
- using uncertainty to decide when to expend more test-time compute (Khan et al., NeurIPS 2023)
- Vision-language representation learning
- by aligning representations of vision models and language models with a minimal number of parameter updates (Khan et al., ICLR 2023)
- by learning to reconstruct each modality (Khan et al., ECCV 2022)
background
I completed my Masters in Computer Vision and Learning Algorithms at Northeastern University in Boston under Raymond Fu in close collaboration with the Media Analytics Group at NEC Laboratories America under Manmohan Chandraker, where I worked on grounded language understanding and reasoning. During my Masters, I won an university-wide Outstanding Graduate Student award for my work on the (mis) use of racial categories in computer vision (Scroll.IN reporting, News@Northeastern reporting). Before graduate school, I spent ~3 years as an early member of the engineering / data science organizations at two high growth startups: Roadie (acquired by UPS for $500m) and Intelligent Flying Machines / OneTrack.AI as software engineer, where I led efforts to scale data infrastructure to match growth, and worked on a range of challenging problems, including embedded deep learning, fault-tolerant distributed systems, realtime adaptive pricing, and data pipelines.
Outside of research, I lift weights, read (here’s my goodreads profile), watch mixed martial arts, and sometimes wonder whether randomness is real.
news
Oct 14, 2024 | DataEnvGym is out! Can we automate the process of generating data to improve a model on diverse, open-ended tasks, based on automatically-discovered model weaknesses? DataEnvGym is a testbed for data-generation agents + teaching environments. Twitter thread |
---|---|
Mar 7, 2024 | Becoming a member of Mohit Bansal’s group (MURGe-Lab) at UNC Chapel Hill as a PhD student, where I’ll be working on multimodal agents, grounded language reasoning, and other exciting vision/language topics! |
Feb 29, 2024 | Two papers accepted to CVPR 2024, on self-training agents to solve computer vision tasks via program synthesis (summer internship work with NEC Laboratories) and black-box predictive uncertainty for multimodal LLMs. |
Feb 22, 2024 | Joining the PRIOR team at AllenAI this summer. |
Sep 24, 2023 | 1 paper accepted to NeurIPS 2023 on improving the reasoning abilities of open multimodal LLMs with question decomposition. (Collaboration with NEC Laboratories America). |
Aug 27, 2023 | Completed my Masters in CompE (concentration in Computer Vision and Learning Algorithms) at Northeastern University at Raymond Fu’s lab. |
Aug 25, 2023 | Recieved a PhD Fellowship from NEC Laboratories America. |
Jun 16, 2023 | 1 paper accepted to CVPR 2023 on self-training with synthetic data for visual question answering. (Summer internship work with NEC Laboratories America). |
May 22, 2023 | Joining the Media Analytics Group of NEC Laboratories America in San Jose again this summer to work on agentic foundation models for computer vision. |
Jan 27, 2023 | 1 paper accepted to ICLR 2023 on efficient vision-language pretraining. |
Jul 4, 2022 | 1 paper accepted to ECCV 2022 on data-efficient vision-language alignment (collaboration with NEC Laboratories America). |
Feb 4, 2022 | Joining the Media Analytics Group of NEC Laboratories America in San Jose this summer. |
Jul 4, 2021 | 1 paper (oral) accepted to ACM Multimedia 2021 on using language models for multimodal affective computing. |
May 3, 2021 | Received Northeastern’s 2021 Outstanding Graduate Student Award! |
Feb 22, 2021 | 1 paper accepted to FAccT 2021 on why racial categories don’t work for fair computer vision. Media Coverage: Scroll.IN, News@Northeastern reporting) |