Kolena debuts platform for testing AI models and fine-tuned variants

Join industry leaders in Boston on March 27th for an exclusive networking event filled with valuable insights and engaging conversations. Request an invite here.

When it comes to integrating AI models into business operations, the question of when it's safe to deploy a chosen model is crucial. It's not just about what model to use or how to use it, but rather the extent of testing required to ensure its reliability. No company wants to experience embarrassing mishaps like the car dealerships using ChatGPT for customer support, where users tricked the system into selling cars for just $1.

Knowing how to effectively test AI models, especially fine-tuned versions, can make or break a successful deployment. It can determine whether the company maintains its reputation or faces financial consequences. Kolena, a San Francisco-based startup co-founded by a former senior engineering manager from Amazon, has recently announced the wide release of its AI Quality Platform. This web application is specifically designed to enable rapid and accurate testing and validation of AI systems.

The AI Quality Platform offered by Kolena encompasses various aspects such as monitoring data quality, conducting model testing and A/B testing, and tracking data drift and model degradation over time. It also provides debugging capabilities. According to Mohamed Elgendy, Kolena's co-founder and CEO, the company decided to tackle this problem in order to facilitate AI adoption in enterprises. Elgendy has firsthand experience in dealing with testing and deploying AI systems, having worked in leadership positions at Rakuten, Synapse, and Amazon.

Kolena's solution is primarily aimed at supporting software developers and IT personnel in building safe, reliable, and fair AI systems for real-world applications. It allows for the rapid development of specific test cases using datasets, enabling a close examination of AI/ML models in scenarios they would encounter in real-world situations. This approach goes beyond relying solely on aggregate statistical metrics, which can sometimes hide a model's performance issues on critical tasks.

Each customer using Kolena's platform can connect their desired AI model via its API and provide their own dataset. They can also specify "functional requirements" outlining how they want their model to operate when deployed, whether it involves manipulating text, imagery, code, audio, or other content. Additionally, customers have the option to measure attributes like bias and diversity in terms of age, race, ethnicity, and other metrics. Kolena runs extensive tests on the model by simulating hundreds or thousands of interactions to ensure that it does not produce undesirable outcomes, and provides insights into the frequency and circumstances of any issues.

The platform also conducts re-tests whenever the model undergoes updates, training, retraining, fine-tuning, or changes by the provider or customer, or during usage and deployment. "Kolena takes the guessing part out of the equation and turns it into a true engineering discipline like software," added Elgendy.

The ability to test AI systems is not only beneficial for enterprises but also for AI model providers themselves. Elgendy highlighted that companies like Google and its Gemini project, which recently faced controversy for generating racially confused and inaccurate imagery, could have benefitted from using Kolena's AI Quality Platform prior to deployment.

True to its commitment to quality, Kolena has extensively tested its own AI Quality Platform on various AI models while refining it based on user feedback and requirements. Over the past 24 months, the company has offered the platform to customers in a closed beta, and the learnings from this period have further enhanced its functionality.

Among the closed beta customers are startups, Fortune 500 companies, government agencies, and AI standardization institutes. Elgendy emphasized that Kolena is actively pursuing customers in three categories: AI foundation model builders, tech buyers, and non-tech buyers. He even mentioned an interesting use case where a company used Kolena's platform to develop a large language model (LLM) solution for fast food drive-throughs, allowing for seamless order placement. Another target market for Kolena is autonomous vehicle builders.

Kolena's AI Quality Platform follows a software-as-a-service (SaaS) model, with three pricing tiers that scale with a company's AI growth journey. It covers everything from analyzing data quality to training and deploying models.

In conclusion, Kolena's AI Quality Platform offers a comprehensive solution for testing and validating AI systems, catering to a wide range of industries and applications. By ensuring safe and reliable AI deployments, Kolena enables enterprises to leverage AI's potential without compromising their reputation or financial stability.