IT Brief UK - Technology news for CIOs & IT decision-makers
Story image
Web scraping, alternative data to grow in popularity
Thu, 5th Jan 2023
FYI, this story is more than a year old

As businesses and organisations continue to look for new ways to gain insights and make decisions, web scraping and alternative data grow in popularity, according to Oxylabs, premium proxy and public web data-acquisition solution provider.

Over the next year, the spotlight will be put on optimising generative AI, cybersecurity, and ML, as well as expanding web data application. 

Increase in generative AI and Cybersecurity significance

Tomas Montvilas, Chief Commercial Officer at Oxylabs, anticipates that generative AI, cybersecurity practices, and alternative data collection can help businesses stay competitive during the upcoming recession. 

“As many economies are predicted to face recession, investment decisions will need to be tighter and less risky. Collecting alternative financial data will be key in getting the necessary signals to make these predictions," he says.

Furthermore, generative AI is increasingly widely used in business use cases, Montvilas says. Massive data sets from the public web are required to train these models.

Another central area Montvilas foresees in the web scraping sector is cybersecurity. Cyber attacks are becoming increasingly sophisticated, necessitating new proactive protection techniques such as threat monitoring and hunting. 

“Cybersecurity practices continue to be adopted by more organisations," he says. 

"Hence, use cases like proactive threat hunting and monitoring that require continuous large-scale web monitoring will become more common."

Spotlight on personal data scraping

During 2022 the scraping industry could breathe a sigh of relief as at least one enduring issue was put to rest. Companies combating scrapers can no longer use the Computer Fraud and Abuse Act (CFAA) to stop scraping public-facing data. 

Denas Grybauskas, the Head of Legal at Oxylabs, says, that "we can expect that during 2023 other legal grounds and arguments will be tried and become more popular in the courts against data scraping companies, such as infringement of terms of service, intellectual property protection, etc.”

As 2022 ended with quite a few stories of personal data scraping and data breaches (Clearview fines in Europe, Meta database leak that affected more than 500M users, Meta's GDPR fines, etc.) According to Denas, we can expect more spotlight on personal data scraping from regulators and authorities.

“Finally, 2023 might be the year when the scraping and data collection industry will begin self-regulation initiatives,” says Grybauskas. 

Intensive scale of web data applications

Gediminas Rickeviius - the VP of Global Partnerships at Oxylabs, is confident that with evolving AI capabilities next year, the same as in 2022, the importance and scale of web data applications in commerce will continue to grow. 

Rickeviius predicts that further parallel evolution of web scraping and blocking systems can also be foreseen. It means a greater need for resources and know-how. 

“Therefore, I suggest leaving web scraping in the expert’s hands," he says. 

"Although the cost of commercial scraping will increase, doing it yourself will be even more expensive than with professionals’ help.” 

Focus on machine learning 

According to Julius Erniauskas, the CEO at Oxylabs, more machine learning models will be deployed in the field. 

“Although there have been many ML failures in the past, I believe the tide is turning for ML engineering teams due to a combination of greater attention on data quality and economic pressure to make ML more useful."

Data scientists were formerly expected to work on a wide variety of data projects. However, when the next generation of data technology becomes more widely used, data scientists will devote their talents to more complex projects using hand-crafted prediction models.

“Many businesses will have difficulties in the next year. As a result, IT companies must discover methods to save expenses, make the greatest use of data scientists' time and talents, and incorporate machine learning and predictive modelling capabilities into teams that directly influence revenue and profitability," says Erniauskas.

“As a result, I expect that businesses will maximise data science resources by augmenting skilled data scientists inside the company with data technologies that automate regular portions of data science work using reliable approaches."