Key Takeaways Perplexity AI faces accusations for bypassing robot.txt instructions on websites. BrightEdge study shows Perplexity is just as effective, or better, than Google at referring people to relevant sites. Perplexity's feature allows sharing information behind paywalls, leading to a need for compromise to ensure fair practices. ✕ Remove Ads
I've said many times that I think Perplexity is the best AI chatbot out there at the moment, especially for research. I admit I'm biased when it comes to the complex ethics of AI because I want to see it succeed. Perplexity has seen its share of accusations, but a recently released study is giving me renewed hope that it will all work out.
The Robot.txt Problem
Nearly all AI chatbots get accused of consuming large amounts of water and power or face intense scrutiny over how they collect data to train the model. Perplexity AI's accusations are a bit more specific than those generalized arguments.
Many websites contain an instruction, robot.txt, in the code that tells AI, bots, or other web crawlers to go no further. Ignoring this command is not necessarily illegal, but the command has been more or less followed since the 90s. Perplexity has been accused of bypassing (or utilizing 3rd party bots who bypass) the robot.txt instruction to get their information.
✕ Remove Ads
Perhaps this practice of leaving no stone unturned is part of the reason why Perplexity's results are so good. However, sometimes the robot.txt instruction protects information behind a paywall or on a site that expects users to sign up and pay for access. By ignoring it, information is included in Perplexity's results allegedly from behind those paywalls. Major publishers obviously take issue with these practices.
The BrightEdge Study
Web publishers and internet marketers have collectively invested millions of dollars in researching how to get discovered or suggested to consumers by Google or other web search engines. Numbers are important, but it's not just about getting content in front of the most people. It's about getting in front of the right people.
✕ Remove Ads
A person searching for a list of console games might be interested in other gaming info, but it's not likely they'll be interested in an article about aquarium fish. Websites belonging to small privately owned car dealerships might want to be in the list of results from the search "car dealerships near me," but a Virginia dealership likely wouldn't benefit from a click by a user in California. At the outset of this latest AI revolution, it was unclear if existing internet marketing practices would survive in the face of AI search engines like Perplexity.
Even if people aren't starting their search in AI chatbots like SearchGPT or Perplexity, they still often receive an AI overview result from Google or Bing, which has its own sources listed. One of the reasons web publishers are extra vigilant regarding the practices of AI companies is more than just reporting allegedly stolen content. It's about who gets to steer web traffic around the internet, and a perceived need to know just where web traffic is going.
✕ Remove Ads
That's where the BrightEdge study comes in. BrightEdge produces data that publishers use to determine how to get content in front of target consumers. In April, BrightEdge released a study showing that Perplexity is just as good as, or better, than Google at referring people to sites that correlate to their search terms. This is great news for everyone!
Consumers already knew that Perplexity provided great results! However, now there's proof. Consumers do click through Perplexity's results directly to the listed sources. Web publishers, hopefully, can relax just a bit, knowing that Perplexity is bringing them just as much or more traffic than Google.
Can't We All Just Get Along? ✕ Remove Ads
There's still the problem of paywalls. Perplexity's Pages feature allows users to publish their results in a beautifully formatted report that's shareable on Perplexity's platform (not available off Perplexity). This means that users can see information normally hidden behind the paywall of a content publisher, allegedly sometimes lifting entire articles verbatim, and sharing it. Even though the sources are still credited on Pages, I'm afraid this just won't be tolerated.
Even though publishers claim the problem is that Perplexity circumvents robot.txt, that code isn't exclusively used just for paywalls. In fact, in light of the BrightEdge study, breaking that "rule" is making everyone money by leading traffic straight from Perplexity to the content they want. The problem is when it circumvents the paywalls. In that instance, the user could avoid the site and therefore evade paying for that content.
✕ Remove Ads
I'm no hostage negotiator, but it seems like there's room for a compromise here. Perhaps Perplexity should be allowed to refer a steady stream of web traffic regardless of the robot.txt instruction as long as it doesn't go behind paywalls. AI isn't going anywhere, and nothing has slowed it down thus far. Smart regulations and reasonable compromise could make the road less bumpy.