Zaid Khan

brain@zaidkhan.me

I’m an 1st-year PhD student at Mohit Bansal’s group (MURGe Lab) at UNC Chapel Hill. I’m also a longtime student researcher in the Media Analytics Group at NEC Laboratories America with Manmohan Chandraker (from 2022). I completed my BS+MS at Northeastern, where I worked with Raymond Fu.

what I’m working on right now

  • automatic skill-based data / environment generation: DataEnvGym frames data generation as an RL-style sequential decision-making problem. The goal is to build agents which can automate the process of identifying the weak skills of a model and generating training data to improve those weak skills. It builds on EnvGen, which generates training environments that help an agent learn skills the agent is weak at.
  • LLM-driven exploration and planning: working with Tanmay Gupta at Ai2’s PRIOR team to build agents which can navigate complex environments and plan long-horizon behaviors within those environments. I’m especially interested in environments where classical search methods fail because of how large the state-action space is, such as a large code repository or complex procedural environments.

what I’ve worked on in the past

background

I completed my Masters in Computer Vision and Learning Algorithms at Northeastern University in Boston under Raymond Fu in close collaboration with the Media Analytics Group at NEC Laboratories America under Manmohan Chandraker, where I worked on grounded language understanding and reasoning. During my Masters, I won an university-wide Outstanding Graduate Student award for my work on the (mis) use of racial categories in computer vision (Scroll.IN reporting, News@Northeastern reporting). Before graduate school, I spent ~3 years as an early member of the engineering / data science organizations at two high growth startups: Roadie (acquired by UPS for $500m) and Intelligent Flying Machines / OneTrack.AI as software engineer, where I led efforts to scale data infrastructure to match growth, and worked on a range of challenging problems, including embedded deep learning, fault-tolerant distributed systems, realtime adaptive pricing, and data pipelines.

Outside of research, I lift weights, read (here’s my goodreads profile), watch mixed martial arts, and sometimes wonder whether randomness is real.

news

Oct 14, 2024 DataEnvGym is out! Can we automate the process of generating data to improve a model on diverse, open-ended tasks, based on automatically-discovered model weaknesses? DataEnvGym is a testbed for data-generation agents + teaching environments. Twitter thread
Mar 7, 2024 Becoming a member of Mohit Bansal’s group (MURGe-Lab) at UNC Chapel Hill as a PhD student, where I’ll be working on multimodal agents, grounded language reasoning, and other exciting vision/language topics!
Feb 29, 2024 Two papers accepted to CVPR 2024, on self-training agents to solve computer vision tasks via program synthesis (summer internship work with NEC Laboratories) and black-box predictive uncertainty for multimodal LLMs.
Feb 22, 2024 Joining the PRIOR team at AllenAI this summer.
Sep 24, 2023 1 paper accepted to NeurIPS 2023 on improving the reasoning abilities of open multimodal LLMs with question decomposition. (Collaboration with NEC Laboratories America).
Aug 27, 2023 Completed my Masters in CompE (concentration in Computer Vision and Learning Algorithms) at Northeastern University at Raymond Fu’s lab.
Aug 25, 2023 Recieved a PhD Fellowship from NEC Laboratories America.
Jun 16, 2023 1 paper accepted to CVPR 2023 on self-training with synthetic data for visual question answering. (Summer internship work with NEC Laboratories America).
May 22, 2023 Joining the Media Analytics Group of NEC Laboratories America in San Jose again this summer to work on agentic foundation models for computer vision.
Jan 27, 2023 1 paper accepted to ICLR 2023 on efficient vision-language pretraining.
Jul 4, 2022 1 paper accepted to ECCV 2022 on data-efficient vision-language alignment (collaboration with NEC Laboratories America).
Feb 4, 2022 Joining the Media Analytics Group of NEC Laboratories America in San Jose this summer.
Jul 4, 2021 1 paper (oral) accepted to ACM Multimedia 2021 on using language models for multimodal affective computing.
May 3, 2021 Received Northeastern’s 2021 Outstanding Graduate Student Award!
Feb 22, 2021 1 paper accepted to FAccT 2021 on why racial categories don’t work for fair computer vision. Media Coverage: Scroll.IN, News@Northeastern reporting)

selected publications

  1. arXiv
    DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
    Khan, Zaid, Stengel-Eskin, Elias, Cho, Jaemin, and Bansal, Mohit
    arXiv preprint arXiv:2410.06215 2024
  2. CVPR
    Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
    Khan, Zaid, Kumar BG, Vijay, Schulter, Samuel, Fu, Yun, and Chandraker, Manmohan
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
  3. CVPR
    Consistency and Uncertainty: Identifying Unreliable Responses From Black-Box Vision-Language Models for Selective Visual Question Answering
    Khan, Zaid, and Fu, Yun
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
  4. NeurIPS
    Exploring Question Decomposition for Zero-Shot VQA
    Khan, Zaid, Kumar BG, Vijay, Schulter, Samuel, Chandraker, Manmohan, and Fu, Yun
    In Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS) 2023
  5. CVPR
    Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
    Khan, Zaid, BG, Vijay Kumar, Schulter, Samuel, Yu, Xiang, Fu, Yun, and Chandraker, Manmohan
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023
  6. ICLR
    Contrastive Alignment of Vision to Language Through Parameter-Efficient Transfer Learning
    Khan, Zaid, and Fu, Yun
    In The Eleventh International Conference on Learning Representations 2023
  7. ECCV
    Single-Stream Multi-Level Alignment for Vision-Language Pretraining
    Khan, Zaid, BG, Vijay Kumar, Yu, Xiang, Schulter, Samuel, Chandraker, Manmohan, and Fu, Yun
    In European Conference on Computer Vision 2022
  8. ACM MM
    Exploiting BERT for Multimodal Target Sentiment Classification Through Input Space Translation
    Khan, Zaid, and Fu, Yun
    In ACM Conference on Multimedia 2021
  9. ACM FAccT
    One Label, One Billion Faces: Usage and Consistency of Racial Categories in Computer Vision
    Khan, Zaid, and Fu, Yun
    In ACM Conference on Fairness, Accountability, and Transparency 2021