Categories
AI Apple ChatGPT

What Still Needs to be Done to Achieve the Assistants We Desire


  1. Smaller, More Specialized LLMs: While small LLMs exist, we need to make them even smaller and more specialized. Alternatively, we should develop an easy method (akin to Apple’s Create ML) for training or specializing them. This is crucial for creating an assistant that is both useful and quick to respond. Most of the processing (inference) should occur locally, on your device (edge computing). This could range from basic summarization to transforming various types of data. For instance, converting a text to JSON or XML while preserving its information and structure should be swift and efficient. Essential data like to-do lists, calendars, weather updates, stock prices, etc., needs frequent updating—some as often as every five minutes. Handling this with a large LLM is not just inefficient but also costly and unnecessary.
  2. Local Vector Database: Building your own database with relevant data should be both fast and user-friendly. This will enable you to efficiently manage information that is important to you.
  3. Home Sensors/Microcontrollers: Integration with the central home system should be seamless. It would be advantageous if companies like Apple could provide older chips for these tasks—affordable small units that can run either on batteries or via cable for stationary setups, equipped with access to GPIO pins. A unified programming approach using a high-level language like Swift is essential. While Arduino and Esp32 are great, the need for high-level language programming is evident. These smaller devices should be capable of quickly converting speech to text locally, allowing the central computer to process the input further. This central unit, using another small LLM, should efficiently delegate tasks—be it fetching online data, checking your to-do list or calendar, or controlling a home robot to assist you with tasks.

For a truly seamless and integrated experience, the technology should be virtually invisible. Currently, too much effort is required to connect various devices. Improved protocols are necessary, and I believe they are on the horizon. This is an intermediary technology; the ultimate solution will operate directly on hardware, leveraging LLMs for enhanced efficiency. Then, high-level programming will become obsolete. However, that’s a vision for the distant future. For now, if we can integrate a few simple components—like customizable LLMs, on-device text-to-speech (similar to the latest Apple Watch), and simple, affordable microcontrollers programmable with Swift—we can achieve a household ecosystem where robots, homes, and computers blend into the background to support our needs.

In summary, we need small, fast, customizable LLMs for efficient on-device text processing, user-friendly vector-based storage, and powerful microcontrollers capable of running high-level code. These microcontrollers should easily connect to the external world through sensors and actuators. Calls to larger LLMs should be limited to tasks that a small LLM determines necessary. Apple is exceptionally well-positioned to lead in this domain, should they choose to pursue it.

The future holds these advancements, though it’s uncertain from where they will emerge.

Happy coding!


By Cosmin Dolha

Cosmin Dolha, born in 1982 in Arad, Romania, is a dedicated programmer and digital artist with over 19 years of experience in the field. Married to his best friend, Cosmin is a proud father of two wonderful boys.

Throughout his career, Cosmin has designed and developed web apps, RIAS, real-time apps, and mobile applications for clients in the United States. He has also created around 25 educational games using AS3 and Haxe and has spent a year working with Unity for VR, ECS, and C# for Oculus GO.

Presently, Cosmin focuses on using Swift (Apple) to build software tools that incorporate GPT and Azure Cognitive Services. His interests extend beyond programming and include art, music, photography, 3D modeling (Zbrush, Blender), behavioral science, and neuropsychology, with a particular focus on the processing of visual information.

Cosmin is an avid podcast listener, with Lex Fridman, Andrew Huberman, and Eric Weinstein among his favorites. His reading list can be found on Goodreads, providing further insight into his interests: https://www.goodreads.com/review/list/78047933?shelf=%23ALL%23

His top 10 songs, available as a YouTube playlist, showcase his taste in music: https://www.youtube.com/playlist?list=PL5aMgX67sX9XltpvlYoih7BRAZwMrckSB

For inquiries or collaboration, Cosmin can be reached via email at contact@cosmindolha.com.