Majority see open web data as vital for fair AI competition
Polling conducted by Oxylabs has revealed that 89% of respondents consider access to public web data essential for a fair and competitive artificial intelligence market.
The survey, carried out during sessions with professionals in the web intelligence community, reflects mounting concern in the sector as barriers to online data are on the rise, especially at a time when major technology companies continue to dominate artificial intelligence investment and returns. Organisations increasingly fear they are losing the ability to access valuable web data, potentially hindering efforts to create open and democratic AI systems.
Data restrictions
Data from live polls indicated that 64% of those surveyed said their organisations had been blocked from more websites over the past year compared to the previous period. This comes as public web-scraped data remains the main source for training AI models for 57% of poll participants, highlighting the dependency on freely available online information within the AI research and development community.
The results provide a snapshot of the obstacles confronting AI practitioners who strive to construct fair and unbiased models. During industry discussions, attention was drawn to the trend of increased restrictions, particularly as online content owners respond to the growing demand for data from AI developers.
"Walls are going up across the open web, and AI teams are among the first to feel it. While many restrictions are primarily being put in place to safeguard content from AI companies, closing off the web will equally affect everyone in need of public data - from traditional businesses, to the public sector, and everyday users," said Julius Černiauskas, Chief Executive Officer at Oxylabs.
Balancing protection and openness
Experts at the forum debated the legal and ethical questions arising from new online data controls. Panellists noted the irony of certain data-scraping companies employing legal mechanisms to prevent others from accessing their own public data, resulting in complex regulatory challenges. There was consensus that regulation should strike a balance between protecting data creators and maintaining the openness necessary for innovation across the industry.
"We need to protect the spirit of the internet and allow public data to stay public. Ethical AI depends on open, lawful, and transparent access to information," added Černiauskas.
Technology, people and process
The polling also revealed that 75% of respondents agreed that the ongoing advancement of anti-blocking technology plays a critical role in keeping web data collection viable over the next five years. With access to public data under threat, the need for technical solutions to navigate web restrictions is considered a priority for organisations seeking to maintain access to the resources required for AI innovation.
Fred de Villamil, Chief Technology Officer at NielsenIQ Digital Shelf, pointed to the broader requirements for overcoming these barriers, saying overcoming difficulties in obtaining public data would demand not only technological solutions but also organisational capabilities and processes.
Looking ahead
Throughout the event, discussions focused on how artificial intelligence is driving changes in web scraping, and how evolving access controls across the internet in turn affect the future direction of AI. Industry voices suggested that responsible AI innovation relies on preserving openness in public data, with stakeholders faced with the decision of reinforcing restrictions or cultivating trust in the digital ecosystem.
The survey's findings indicate that a large majority of industry participants now see unrestricted access to public web data as indispensable for maintaining competition and fairness as AI continues to expand.