Chaitanya's Blog

Speculating about AI

Lets summarize what AI can do today. It can:

  1. Answer pretty much any general question you have
  2. Have a conversation with you (in your language)
  3. Code
  4. Create and manipulate images and videos

You might be right in saying it does not do most of these things "well". But I will for now naively extrapolate based on its improvements in the last few years and say it will be "good" at most of these things in a few years.

What it seems to lacks today:

  1. Human-ness
  2. Accuracy
  3. Reasoning
  4. Memory
  5. Specificity
  6. Capability

Lets dive a bit deeper into these and speculate.

Human-ness: character, creativity

Does AI feel human? Can it create things as well as a human?

I actually don't need/want it to be solved. I want to think of AI as an exceedingly capable, eager, but uninteresting person. Watching AI generated content is uninteresting to me. I am sure we will make progress here, but I don't look forward to this. My prediction though AI will be used as tools for creativity, purely AI generated content will be seen as how mass produced items like Wonderbread are seen to today.

Accuracy: makes up stuff

Today's AI confidently serves up misinformation and makes frequent mistakes.

This is a problem which needs to be solved, and I am sure it will be majorly solved. AI will only be useful or relevant if it is reliable and accurate.

But if I were to equate AI to human intelligence, there is always a good chance that a human will be wrong. We will have to provide this margin for an AI to be wrong.

In tasks were we cannot afford even a small chance of failure, we can supplement it with deterministic algorithms (maybe even ask AI to generate them on the fly).

Memory: does it remember and learn?

Use today's AI is a like interacting with the guy from Memento. We need to:

This seems to be the lowest hanging fruit among all, and seems to be on track to be solved in assistants, APIs and other interfaces.

Reasoning: why is a particular output being produced?

We can think of reasoning as "high-level" and "low-level": high level is breaking down complex tasks into lower level problems, and low-level would be explaining how to perform a micro-task.

For example, if we need to walk to the nearest metro station, the high level tasks would be to find directions, follow the directions and to physically the station. Here, the high-level reasoning is very important to understand - and tweak to accomplish a lot of different high level tasks. Low-level task of physically walking is not, as long it works well, its fine. A human cannot put into words exactly how to walk, and a neural net might not be able to either. This reasoning might be important at scientific or a debugging level to improve the models, but on a day-to-day basis, low-level reasoning is unimportant.

Specificity: can the AI be a specialist?

Today's AI is a great generalist, but it does not do a good job working within a context.

How can AI help me as though it is my office colleague who is aware of the specific details of my work? For example, does AI know every detail of corporate law or general medicine and able to perform these tasks like an expert would? Can it continuously update itself with the changing context?

These seem to be doable to an extent today by a combination of clever prompting fine tuning, but it is not good enough. It also extremely difficult to do - curating the data and actually performing the fine tuning seems like a task for a technical expert. I expect this to get a lot better and become much more user friendly. Anybody should be able to "link" resources they have to the AI to convert it into a specialist.

Capability: what can the AI do?

Today's AI is mostly limited to the chatbox. There are a few other ways to access it - Gemini is now an assistant on your phone which can talk. Claude Code now helps you code. Photos app has AI manipulating your images directly. ChatGPT voice mode can recognise items from live video.

The future of AI is in increasing this capability - ability for it to do more "low-level" tasks. Seeing, walking, picking and other types of interfaces we have not thought about in the AI context yet.

These will enable AI to move out of our screens and permeate into our everyday, physical world. LLMs can already churn out JSON commands to hardware when prompted well. You can use this today to tell your robot or any type of hardware to follow vague commands. VLA models like Gemini Robotics, NVIDIA GR00T, Physical Intelligence pi0 are building "generalist" robotics models which can theoritically enable a robot do anything without specialised training.

This is what excites me the most today, and is a thread in AI I will follow keenly.