Skip to content
Business Company News, Information Technology

Kangaroo LLM Launches Massive Web Crawl to Build Australia’s First Open-Source AI Model

Kangaroo LLM 2 mins read

The Kangaroo LLM project today announced the launch of an extensive web crawling initiative to create Australia's first open-source artificial intelligence model. This ambitious effort will see the project's custom web crawler, "Kangaroo Bot," begin collecting data from 754,000 Australian websites starting September 25th onwards to build the VegeMighty dataset, a comprehensive corpus of Australian English content.

With over 4.2 million registered domains in Australia, this initial phase represents a significant step towards developing an AI model that genuinely understands and represents Australian language and culture.

"This initiative marks a pivotal moment in Australia's AI journey," said Vinod Bijlani, AI Practice Leader at Hewlett Packard Enterprise (HPE) and a key partner in the Kangaroo LLM consortium. "By ethically harvesting data from 754,000 websites in this first phase, we're laying the groundwork for an AI that will not only understand Australian English but will also grasp the nuances of our diverse digital landscape. This is more than just data collection; it's about capturing the essence of Australian online communication and culture."

Key aspects of the web crawling initiative include:

  1. Extensive Scope: Targeting 754,000 Australian websites in the first phase to create a diverse and comprehensive dataset.
  2. Ethical Data Collection: Adhering to responsible web crawling practices and respecting website owners' preferences.
  3. Transparency: Commitment to publishing the full list of websites to be crawled, fostering trust and open dialogue.
  4. Data Sovereignty: All collected data will be processed and stored within Australia, ensuring compliance with national regulations.
  5. Immediate Commencement: Web crawling will begin on September 25th, 2024.

The Kangaroo LLM project is committed to responsible data collection. Website owners who wish to opt out of the Kangaroo Bot crawl can do so by adding the following to their robots.txt file

User-agent: Kangaroo Bot
Disallow: /

"This extensive data collection effort is not just about creating an AI model; it's about building a foundation for Australia's AI future," Bijlani added. "We're inviting all Australians to be part of this groundbreaking journey, whether by allowing us to include their sites in our dataset or by following our progress."

The Kangaroo LLM consortium, which includes industry leaders such as Katonic, RackCorp, NextDC, Hitachi Vantara, and HPE, views this initiative as a crucial step towards establishing Australia as a leader in ethical AI development.

For more information about Kangaroo LLM, the web crawling initiative, or to check if your website is included in the crawl list, visit kangaroollm.com.au.

About Kangaroo LLM: Kangaroo LLM is a collaborative project to create Australia's first open-source large language model, specifically tailored for Australian English. Led by a consortium of leading Australian tech companies, the project aims to enhance AI sovereignty, foster innovation, and create new economic opportunities in the Australian tech sector.

More from this category

  • Information Technology
  • 18/11/2024
  • 15:25
BlackBerry

BlackBerry Welcomes Canada’s Support To Enhance Cyber Resilience in Southeast Asia

In collaboration with Toronto Metropolitan University, the Government of Canada will invest $3.9M CAD in the Malaysia Cybersecurity Center of Excellence to strengthen cybersecurity partnerships and expertise in the region through skills training opportunities, including for women WATERLOO, ON / ACCESSWIRE / November 17, 2024 / At the APEC Leaders' Summit, BlackBerry Limited (NYSE:BB)(TSX:BB), expressed its gratitude for the Government of Canada's announcement to invest $3.9 million (CAD) in Malaysia. This funding will support cybersecurity training programs and threat intelligence initiatives aimed at strengthening capacity building and enhancing overall cyber resilience in Southeast Asia.Canada's support will be delivered by BlackBerry…

  • Information Technology, Local Government
  • 18/11/2024
  • 15:23
Willoughby City Council

“Significant enhancement” of customer experience as Willoughby City Council launches new online platform

Labelled as a “significant enhancement” of customer experience by Mayor Tanya Taylor, Willoughby City Council is celebrating the launch of its new MyWilloughby platform—an…

  • Contains:
  • Business Company News, Oil Mining Resources
  • 18/11/2024
  • 14:48
Jane Morgan Management

American Rare Earths Limited Engages BMO Capital Markets as Financial Adviser to Accelerate Halleck Creek Project Development

Sydney, Australia – 18 November 2024 | American Rare Earths Limited (ASX:ARR | OTCQX:ARRNF | ADR:AMRRY) (“ARR” or “the Company”) is pleased to announce the engagement of BMO Capital Markets Limited (“BMO”), which is a recognised leader in metals and mining financial advisory around the world. This strategic partnership is designed to accelerate the development of ARR’s flagship Halleck Creek Rare Earths Project in Wyoming, USA. Highlights: Strategic Advisory Engagement: BMO will lead efforts to explore strategic investments, potential joint ventures, possible mergers and acquisitions, and offtake agreements to fast-track the development of the Halleck Creek Project in Wyoming, USA.…

Media Outreach made fast, easy, simple.

Feature your press release on Medianet's News Hub every time you distribute with Medianet. Pay per release or save with a subscription.