Copypasted from Chip Huyen

Problems I’d do if I’m to do a startup again (though I probably won’t any time soon because startups are hard). If you’re solving any of them, I’d love to chat.

  1. Data synthesis: AI has become really good both at generating and annotating data. The challenge now is to make sure that the generated data is safe and legal, e.g. not violating any IP.

  2. Evaluation: evaluation has gotten so much harder with LLMs, both because many people treat models as blackboxes (we deploy models someone else developed for us) and because outputs can be open-ended. At the same time, investment in evaluation is nowhere close to investment in model or application development. I’d be interested in arena-style evaluation, embedding evaluation, human-in-the-loop evaluation, as well as small, specialized scorers (instead of using large models like GPT-4 as judges).

  3. Energy: the bottleneck to scaling AI is no longer compute but electricity. I’m interested in all energy-related problems, including both new energy sources and energy trading.

  4. Any application that allows you to collect unique data that nobody has. I’ve heard concerns about building applications that seem to be “wrappers” around popular APIs. If you can get to the market early and gather sufficient data to continually improve your product, data is your moat.

  5. GPU-native everything: many data science toolings, including scikit-learn, pandas, and Spark, aren’t built to run natively on GPUs. There have been efforts to make these tools more efficiently leverage GPUs, but I think there’s still a lot of room for the software layer for GPUs (and not just NVIDIA GPUs).

  6. Curated Internet: bots are already ruining dating apps, search (bots are incredibly good at SEOs), and social media. I’d like to be able to set a boundary for my Internet, e.g. to limit the search results to those written by people I trust, or sources verified to be human.