Zaid Khan
My goal is to build trustworthy, teachable multimodal language-driven agents that can reason and write code.
I’m an incoming PhD student at Mohit Bansal’s group (MURGe Lab) at UNC Chapel Hill. I’m also a student researcher in the Media Analytics Group at NEC Laboratories America with Manmohan Chandraker (from 2022). I completed my BS+MS at Northeastern, where I was fortunate to be advised by Raymond Fu.
research interests
- Grounded, complex reasoning tasks, such as interactive theorem proving or open-world image understanding. For example, GPT-4V is still far from solving WinoGround, and EncylopedicVQA is difficult even for PaLI.
- Foundation model-driven agents that learn from grounded interaction. Systems like Voyager, ViperGPT use LLMs as planners, but keep them frozen. Can we improve the LLM from interactive feedback? This has been done in formal environments like LeanDojo for theorem-proving, but how do we construct virtual environments with feedback for tasks like open-world understanding?
- Neurosymbolic systems in general, but especially approaches using program synthesis as a way to represent reasoning formally, disentangle reasoning from perception and impose constraints on behavior.
- Uncertainty quantification and reliable models. A requirement for high-stakes applications (and even personal use) for any AI system is an ability to say “I don’t know”. A problem I’ve been thinking about is selective prediction for open-ended visual question answering, because the uncertainty can come from both the language model itself, as well as the binding between vision and language.
about
I completed my Masters in Computer Vision and Learning Algorithms at Northeastern University in Boston under Raymond Fu in close collaboration with the Media Analytics Group at NEC Laboratories America under Manmohan Chandraker, where I worked on grounded language understanding and reasoning. During my Masters, I won an university-wide Outstanding Graduate Student award for my work on the (mis) use of racial categories in computer vision (Scroll.IN reporting, News@Northeastern reporting). Before graduate school, I spent ~3 years as an early member of the engineering / data science organizations at two high growth startups: Roadie (acquired by UPS for $500m) and Intelligent Flying Machines / OneTrack.AI as software engineer, where I led efforts to scale data infrastructure to match growth, and worked on a range of challenging problems, including embedded deep learning, fault-tolerant distributed systems, realtime adaptive pricing, and data pipelines.
Outside of research, I lift weights, read (here’s my goodreads profile), watch mixed martial arts, and sometimes wonder whether randomness is real.
news
Oct 14, 2024 | DataEnvGym is out! Can we automate the process of generating data to improve a model on diverse, open-ended tasks, based on automatically-discovered model weaknesses? DataEnvGym is a testbed for data-generation agents + teaching environments. Twitter thread |
---|---|
Mar 7, 2024 | Becoming a member of Mohit Bansal’s group (MURGe-Lab) at UNC Chapel Hill as a PhD student, where I’ll be working on multimodal agents, grounded language reasoning, and other exciting vision/language topics! |
Feb 29, 2024 | Two papers accepted to CVPR 2024, on self-training agents to solve computer vision tasks via program synthesis (summer internship work with NEC Laboratories) and black-box predictive uncertainty for multimodal LLMs. |
Feb 22, 2024 | Joining the PRIOR team at AllenAI this summer. |
Sep 24, 2023 | 1 paper accepted to NeurIPS 2023 on improving the reasoning abilities of open multimodal LLMs with question decomposition. (Collaboration with NEC Laboratories America). |
Aug 27, 2023 | Completed my Masters in CompE (concentration in Computer Vision and Learning Algorithms) at Northeastern University at Raymond Fu’s lab. |
Aug 25, 2023 | Recieved a PhD Fellowship from NEC Laboratories America. |
Jun 16, 2023 | 1 paper accepted to CVPR 2023 on self-training with synthetic data for visual question answering. (Summer internship work with NEC Laboratories America). |
May 22, 2023 | Joining the Media Analytics Group of NEC Laboratories America in San Jose again this summer to work on agentic foundation models for computer vision. |
Jan 27, 2023 | 1 paper accepted to ICLR 2023 on efficient vision-language pretraining. |
Jul 4, 2022 | 1 paper accepted to ECCV 2022 on data-efficient vision-language alignment (collaboration with NEC Laboratories America). |
Feb 4, 2022 | Joining the Media Analytics Group of NEC Laboratories America in San Jose this summer. |
Jul 4, 2021 | 1 paper (oral) accepted to ACM Multimedia 2021 on using language models for multimodal affective computing. |
May 3, 2021 | Received Northeastern’s 2021 Outstanding Graduate Student Award! |
Feb 22, 2021 | 1 paper accepted to FAccT 2021 on why racial categories don’t work for fair computer vision. Media Coverage: Scroll.IN, News@Northeastern reporting) |