News

Code in a bag: what we learned from the Yandex data leak and what it will lead to

Photo by Vladislav Shatilo / TASS
Photo by Vladislav Shatilo / TASS

On Thursday, January 26, it became known that source codes and accompanying data for many Yandex services and programs appeared on the Internet. The total data size is about 45 GB in compressed form, we are talking about the source codes of many of the company’s products – from Mail and Taxi to Disk and Alice. How serious a blow this is for the company and how it threatens users – Forbes understood

Event

The fact that there was a leak was one of the first  to be noticed by Twitter user Dmitry Balakov. Arseniy Shestakov, CTO of Hack The Publisher, told Forbes that while scrolling through his Twitter feed, he came across his message and checked the information about the leak on the BreachForums website. “It became interesting whether this data is real. I asked a friend about this, who worked at Yandex until 2022, and after checking the archives, he confirmed this information to me. There was no news of the leak at the time,” he says. According to Shestakov, he decided to prepare a full-fledged blog post for the English-speaking audience, but he was a little late, since by this time another user had already posted a post about the leak on Hacker News.

What has leaked

Part of the company’s private repositories and the source code of a mass of significant Yandex services turned out to be in the public domain. Among them are Search Engine and Indexing Bot, Maps, Taxi, Direct, Mail, Disk, Market, Travel, Yandex360, Yandex Cloud, Yandex Pay payment service, search , as well as lists of phrases that users use to turn off the Alice voice assistant, are listed by Konstantin Melnikov, head of the department for analyzing and evaluating digital threats at Infosecurity a Softline Company.

“The amount of code that has been made publicly available is huge – 44 GB, so we will probably gradually meet new analyzes of certain modules. Enthusiasts will gradually disassemble and analyze the code of specific Yandex products,” says Nikita Nazarov, technical director of the IT company HFLabs. “Yandex initially announced “code fragments”, which is far from the truth. We can say for sure that a significant part of the server code of the company’s services has been published, ”continues Shestakov.

See also  The Epic Battle: Musk and Zuckerberg Set to Clash in Vegas Cage Match as Their Empires Stand

The archive contains only the state of the code for an unknown date, all files are forcibly set to the date of the last modification – 02/24/2022, but by indirect evidence it can be said that the latest changes were made at the end of July 2022, says Nikita Nazarov: “There is no history of changes in the archive , no trained models, no directories, no databases of users and their passwords. It will not work to compile this code and launch your own Yandex.Taxi, if only because there are not enough different internal libraries that did not fall into the drain.

According to Grigory Bakunov, ex-Director for Dissemination of Yandex Technologies, the data is rather useless, but it is suitable for studying the code. “Firstly, try to collect at least something from there, it is very non-obvious and often requires the internal infrastructure of Yandex,” he writes in his Telegram channel. – Secondly, for AI projects there is no most important thing – trained scales, that is, the model that you get after assembly is simply not trained. There is no dataset for training either. This, of course, is not a hack, but a leak of one of the employees.

What they say in Yandex

The press service of “Yandex” Forbes confirmed the leak of the code, emphasizing that “there was no hacking.” “Yandex security service has found code fragments from an internal repository in the public domain. However, their content differs from the current version of the repository, which is used in Yandex services, the company noted.

They also indicated that the repositories “are not intended to store personal data of users.” “We are conducting an internal investigation into the causes of source code fragments being made publicly available, but we do not see any threat to our users’ data or the platform’s performance,” the press service assured.

See also  Donald Trump's Victory in the 2024 Elections: Key Promises and Changes in the U.S. and Beyond

A Forbes source close to Yandex confirmed that “no hackers” were involved in the source code leak. “We are talking about one of the employees,” he explained.

Despite the assurances of Yandex and the assumption that one of the employees organized the leak, there are other opinions. According to Igor Bederov, head of the information and analytical research department at T.Hunter, the original source of the leak was posted on a thematic forum by a user under the nickname borderline. He wrote that the hack happened back in July 2022 and was first published “today”. Igor Bederov associates this user with a hacker group that, according to him, has committed 80% of all hacks with leaks of users’ personal data in Russia.

What the leak threatens the company and its users

The leak of the source code does not really pose a direct threat to user data, Arseniy Shestakov agrees. “At the same time, access to source codes will simplify attacks on the company’s infrastructure in the future. We should also expect more phishing attacks, when criminals pass off their sites as Yandex sites, he believes.

Patents, developments, know-how of Yandex were compromised, Igor Bederov believes. In addition, you can study how the service is de facto arranged, for example, to find out how Yandex monitors users of Alice’s smart assistant and whether there is a discrepancy with the company’s user agreement – the leak contains a text version of voice requests and passive collection of information, he says .

Leaks of this kind can lead to the exploitation of unknown vulnerabilities, says a Forbes source in a large information security company. For example, attackers, in his opinion, having received this code, can find weaknesses, not report them to the company affected by the leak, but start writing an exploit (program code or a set of instructions that allows using a vulnerability to carry out an attack) and, for example, access user information. Or get inside the company, encrypt data, leak data to the public, distribute malware. “In the code, most likely, there are logins and passwords that Yandex, if true, is now trying to urgently close, but if attackers can find them, they will also be able to merge, steal, encrypt and gain a foothold in the system,” concludes he.

See also  Game-Changer in AI: Sam Altman's Move to Microsoft Sends Stocks Soaring

Also, according to Forbes sources in the cybersecurity market, some security tools have leaked – plug-ins for pentester utilities that Yandex wrote itself. Using these plugins, attackers will be able to carry out attacks. “To summarize, this leak does not explicitly pose new threats to users, but it certainly opens up new opportunities for attackers,” a Forbes interlocutor in a large information security company is sure. “Now it will become easier to look for vulnerabilities in Yandex services, so now it’s just a matter of time to crack them.”

The leak of internal repositories
threatens to lose important algorithms for the work of Yandex services, says
Konstantin Melnikov. These algorithms, in his opinion, can be used to
modify existing applications or create other algorithms to
bypass service blocking, as well as to build applications or services based
on Yandex sources. “There is still an unconfirmed assumption
that if you study all the data, you can understand how the above
services work and, for example, why the price of a taxi rises, and on the Market – for goods.”

However, so far this is only an opportunity to understand the internal structure, and it is unlikely that attackers will be able to take advantage of this without access inside, except for hacking attempts and the use of social engineering, a source in the information security market argues. At the moment, according to him, the risks are mainly of a reputational nature: “Such incidents happen often, just not all of them become known in the public field. The question remains how successfully the company closed access to internal servers, keys, etc.”

usiic_admin
Greetings ! I am the founder of the usiic.co platform since 2022, I am also the editor-in-chief, I hope you will find the usiic platform useful, in which I have invested a lot of time and effort. I am interested in many areas, and on the blog I share my impressions, advice and experience. I would be very grateful if you rate my post and share yours: https://usic.co
https://usiic.co/groups/news/

Leave a Reply