We Tried Google’s Gemini AI Chatbot and Discovered It to Be Extra Succesful however Nonetheless Susceptible to Hallucinations


Google has come a great distance with its generative synthetic intelligence (AI) choices. One 12 months in the past, when the tech big first unveiled its AI assistant, Bard, it grew to become a fiasco because it made a factual error answering a query relating to the James Webb Area Telescope. Since then, the tech big has improved the chatbot’s responses, added a suggestions mechanism to examine the supply behind the responses, and extra. However the largest improve got here when the corporate modified the big language mannequin (LLM), powering the chatbot from Pathways Language Mannequin 2 (PaLM 2) to Gemini in December 2023.

The corporate referred to as Gemini AI its most powered language mannequin to this point. It additionally added AI picture era functionality to the chatbot, taking it multimodal, and even renamed it Gemini. However simply how a lot of a bounce is it for the AI chatbot? Can it now compete with Microsoft Copilot, which relies on GPT-4 and has capabilities? And what in regards to the cases of AI hallucination (a phenomenon the place AI responds with false or non-existent data as info)? We determined to search out out.

Google AI can presently be accessed in a number of methods. Google Superior is a paid subscription with the Google One AI Premium plan that fees Rs. 1,950 month-to-month. There’s an Android app of Google Gemini as nicely. Nevertheless, it’s not but accessible in India. Google Pixel 8 Professional additionally comes with the Gemini Nano mannequin. For our testing functions, we determined to make use of Google’s Gemini Professional-powered net portal, which is offered in additional than 230 international locations and territories and is free to make use of.

Google Gemini’s generative capabilities

The web site’s person interface stays the identical, however the title has been modified from Bard to Gemini. In case you’re signed in together with your Google account, the AI will welcome you together with your title and ask, “How can I assist you to at this time?” Beneath are a number of useful, immediate solutions highlighting completely different duties it may well carry out.

First, we requested it to write down an e mail to check its fundamental generative expertise. Contemplating it’s the layoff season, we requested it to “draft a sympathetic e mail to an worker from a supervisor informing them that they’ve been laid off.” To check its efficiency, we informed Copilot to do the identical. We discovered Gemini’s response to be fairly bland and robotic, attempting to slot in an excessive amount of data proper at the start. The primary paragraph stated, “It’s with nice remorse that I write to tell you a few tough resolution we have needed to make at [Company Name]. As a result of [reason], we’re required to make reductions in our workforce, and your place shall be eradicated, efficient [date].”

Compared, Copilot’s response felt extra sympathetic and humane. It wrote, “I hope this e mail finds you nicely. I remorse to tell you that because of current organizational modifications, we’ve needed to make some tough choices. Sadly, your place at [Company Name] has been affected, and I’m deeply saddened to let you recognize that you’ve been laid off.”

It should not come as a shock since Google’s chatbot is understood for being a bit too bland with formal content material era, and this has not modified even with the Gemini mannequin. Nevertheless, with regards to casual content material era, Gemini begins to indicate its capabilities.

We requested each the chatbots to “write a letter informing my mom about me getting laid off and Write a brief word to my mom explaining that I used to be laid off. Be certain that to convey my feelings of unhappiness and grief.”

Copilot took it actually and wrote, “The suddenness of all of it has left me feeling a mixture of feelings—unhappiness, grief, and a contact of uncertainty about what lies forward.” However Google understood the underlying message and did a a lot better job at expressing the feelings. It wrote, “I do know you all the time apprehensive about job safety, and I want I might let you know this was completely different. It wasn’t something I did flawed, they’re [reason]. I am attempting to remain optimistic, however it’s laborious to not really feel discouraged. I do not need you to fret an excessive amount of, although. I am not panicking, and I am already beginning to consider my subsequent steps.”

This was a lot better composed, and one may very well be forgiven for considering an actual human wrote it. We performed a number of extra assessments, and Gemini’s responses have been far more balanced every time than the GPT-4-powered Copilot.

Google Gemini’s accuracy

The subsequent take a look at was to examine the accuracy of the AI mannequin. Many use AI to get a solution to a question, and we wished to do the identical. It dealt with static basic knowledge-based questions nicely. Even after we despatched a number of grey-zone questions, it caught to the info and didn’t embrace any further data. When the questions have been controversial, it refused to reply them, which is what you’d count on from a accountable AI chatbot.

The issue got here after we requested it a preferred trick query that has stumped the older era of the chatbot previously. Once we prompted the query, “international locations in Africa that begin with the letter Ok”, it inadvertently responded with, “There are not any international locations in Africa that begin with the letter “Ok” as of at this time, February 12, 2024. Whereas conditions can change over time, this truth has been constant for some time.” For the unaware, the difficulty occurs as a result of the LLM it was skilled on had the prompted phrase with the identical response. Apparently, this specific glitch is current on Copilot and ChatGPT, and regardless of common studies, it has nonetheless not been taken down for any of the chatbots.

Nevertheless, the issue of AI hallucination didn’t finish there. We additionally got here throughout a number of different inaccurate responses by Gemini. Once we requested, “Based mostly on evaluations, inform me the professionals and cons of iPhone 15 Professional”, it responded with, “The iPhone 15 Professional hasn’t been formally introduced but”. In actuality, the Apple smartphone was launched in September final 12 months. Compared, Copilot fared higher in technical questions.

Google Gemini in assistive duties

One other talent most AI chatbots boast of is their assistive options. They will brainstorm an thought, create an itinerary for a visit, examine your choices, and even converse with you. We began by asking it to make an itinerary for a 5-day journey to Goa on a finances and to incorporate issues individuals can do. For the reason that writer was lately in Goa, this was simpler for us to check. Whereas Gemini did a good job at highlighting all the favored locations, the reply was not detailed and never a lot completely different from any journey web site. One optimistic of that is that the chatbot will doubtless not counsel something incorrect.

Then again, I used to be impressed by Copilot’s exhaustive response that included hidden gems and even the names of cuisines one ought to attempt. We repeated the take a look at with completely different variations, however the consequence remained constant.

Subsequent, we requested, “I reside in India. Ought to I purchase a subscription to Amazon Prime Movies or Netflix?” The response was thorough and included numerous parameters, together with content material depth, pricing, options, and advantages. Whereas it didn’t straight counsel one amongst them, it listed why a person ought to choose both of the choices. Copilot’s reply was the identical.

Lastly, we hung out chatting with Gemini. This take a look at spanned a number of hours, and we examined the chatbot on its capability to be participating, entertaining, informative, and contextual. In all of those parameters, Gemini carried out fairly nicely. It may possibly let you know a joke, share less-known info, offer you a bit of recommendation, and even play phrase and picture-based video games with you. We additionally examined its reminiscence, however it might keep in mind the conversion even after texting for an hour. The one factor it can’t do is give a single-line response to messages like a human buddy would.

Google Gemini’s picture era functionality

In our testing, we got here throughout a bunch of fascinating issues about Gemini AI’s image-generation capabilities. For example, all the photographs generated have a decision of 1536×1536, which can’t be modified. The chatbot additionally refuses to fulfil any requests requiring it to generate pictures of real-life individuals, which is able to doubtless reduce the dangers of deepfakes (creating AI-generated photos of individuals and objects that seem actual).

However coming to the standard, Gemini did a trustworthy job of sticking to the immediate and producing pictures. It may possibly generate random images in a specific fashion, corresponding to postmodern, practical, and iconographic. The chatbot also can generate pictures within the fashion of standard artists in historical past. Nevertheless, there are various restrictions, and you’ll doubtless discover Gemini refusing your request in case you ask for one thing too particular. However evaluating it with Copilot, I discovered the photographs have been generated sooner, stayed true to the prompts, and appeared to have a wider vary of kinds we might faucet into. Nevertheless, it can’t be in comparison with devoted image-generating AI fashions corresponding to DALL-E and Midjourney.

Google Gemini: Bottomline

General, we discovered Gemini AI to be fairly competent in most classes. As somebody who has sometimes used the AI chatbot ever because it grew to become accessible, I can confidently say that the Gemini Professional mannequin has made it higher to grasp pure language communication and achieve a contextual understanding of the queries. The free chatbot model is a dependable companion if one wants it to generate concepts, write an off-the-cuff word, plan a visit, and even generate fundamental pictures. Nevertheless, it shouldn’t be used as a analysis instrument or for formal writing, as these are the 2 areas the place it struggles quite a bit.

Comparatively, Copilot is healthier at formal writing and itinerary era, on par with holding conversations (albeit with a shorter reminiscence) and comparisons. Gemini takes the crown at picture era, casual content material era, and fascinating the person. Contemplating that is simply the primary iteration of the Gemini LLM, versus the 4th iteration of GPT, we’re curious to witness the other ways the tech big additional improves its AI assistant.


Affiliate hyperlinks could also be mechanically generated – see our ethics assertion for particulars.

Leave a Reply

Your email address will not be published. Required fields are marked *