What Still Needs to be Done to Achieve the Assistants We Desire

Smaller, More Specialized LLMs: While small LLMs exist, we need to make them even smaller and more specialized. Alternatively, we should develop an easy method (akin to Apple’s Create ML) for training or specializing them. This is crucial for creating an assistant that is both useful and quick to respond. Most of the processing (inference) should occur locally, on your device (edge computing). This could range from basic summarization to transforming various types of data. For instance, converting a text to JSON or XML while preserving its information and structure should be swift and efficient. Essential data like to-do lists, calendars, weather updates, stock prices, etc., needs frequent updating—some as often as every five minutes. Handling this with a large LLM is not just inefficient but also costly and unnecessary.
Local Vector Database: Building your own database with relevant data should be both fast and user-friendly. This will enable you to efficiently manage information that is important to you.
Home Sensors/Microcontrollers: Integration with the central home system should be seamless. It would be advantageous if companies like Apple could provide older chips for these tasks—affordable small units that can run either on batteries or via cable for stationary setups, equipped with access to GPIO pins. A unified programming approach using a high-level language like Swift is essential. While Arduino and Esp32 are great, the need for high-level language programming is evident. These smaller devices should be capable of quickly converting speech to text locally, allowing the central computer to process the input further. This central unit, using another small LLM, should efficiently delegate tasks—be it fetching online data, checking your to-do list or calendar, or controlling a home robot to assist you with tasks.

For a truly seamless and integrated experience, the technology should be virtually invisible. Currently, too much effort is required to connect various devices. Improved protocols are necessary, and I believe they are on the horizon. This is an intermediary technology; the ultimate solution will operate directly on hardware, leveraging LLMs for enhanced efficiency. Then, high-level programming will become obsolete. However, that’s a vision for the distant future. For now, if we can integrate a few simple components—like customizable LLMs, on-device text-to-speech (similar to the latest Apple Watch), and simple, affordable microcontrollers programmable with Swift—we can achieve a household ecosystem where robots, homes, and computers blend into the background to support our needs.

In summary, we need small, fast, customizable LLMs for efficient on-device text processing, user-friendly vector-based storage, and powerful microcontrollers capable of running high-level code. These microcontrollers should easily connect to the external world through sensors and actuators. Calls to larger LLMs should be limited to tasks that a small LLM determines necessary. Apple is exceptionally well-positioned to lead in this domain, should they choose to pursue it.

The future holds these advancements, though it’s uncertain from where they will emerge.

Happy coding!

By Cosmin Dolha