SCCG Research: Machine learning, AI in sports data and fraud detection

All scientific advancements are based on the work of the scientists who came before them. Still, the last generation didn’t strap rocket packs onto the boots of the next generation of data scientists and fire them into space because their shoulders were getting tired. It was commercial demand for computing power that unlocked the exponential growth of AI and ML’s capabilities and reach.

First, the increasing needs of the PC gaming industry began driving designers and manufacturers like Nvidia to create ever more powerful graphical processing units (GPUs) necessary for next-generation computers, consoles and games. Next, the cryptocurrency boom drove demand for more powerful GPUs, as tokens like Bitcoin require more work to mine the next one and become increasingly rare by design.

GPU manufacturers were no longer primarily reliant on gamers’ discretionary (and massive) spending power. They had unlocked a whole new market that was literally creating wealth out of thin air – mountains of imaginary digibucks, which drove further investment and advancement in GPU processor development. This feedback cycle contributed to Nvidia being worth around $3.1trn in market capitalisation today – the third-most valuable company in the world.

Next came the data scientists who saw GPU manufacturers churning out this incredibly powerful hardware that, unlike the powerful computing generalist of the PC, the central processing unit (CPU), speeds up specific tasks like matrix multiplication – exactly what ML models need to scale up the number of parameters they need to handle to begin to make valuable predictions at speed.

With the commoditisation of the massive power that AI/ML could give every company, any of them could get a shot at membership in the Avengers

It’s hard for most people to visualise just how big a jump we took in terms of parallel processing and the size of AI/ML models but let me try.

Imagine a data scientist at home, watering their lawn with a regular old garden hose (yes, I understand how improbable it is that an employed data scientist would water their own yard with a hose, but work with me here). That green stripy hose has a flow rate of about 10 gallons per minute. Imagine that as the capacity of GPT-3.

Now, imagine a completely different data scientist standing at a console at the top of the Hoover Dam, one of America’s seven modern engineering wonders. They push a big red button on the console, opening the gates within two massive concrete spillways, disgorging 179 million gallons of water per minute into the thirsty Colorado River below. If not for Hoover Dam’s concrete channels and diversion tunnels, the force of the water would arc out several hundred feet before gravity could begin flooding the valley beneath it.

The garden hose nozzle and Hoover Dam spillway visualisation is a reasonable, real-world comparison between GPT-3, released in mid-2020, and GPT-4, released in early 2023 – less than three years apart. We saw similar growth in Google’s LLM products BERT and successor products, T5 and Switch Transformer. To further extend our analogy to other competitors, Microsoft released Turing-NLG 17B and followed up with Turing-NLG 530B, a direct competitor to GPT-4 in size and capabilities – if not branding.

There is still room for growth, as demonstrated by this article’s delivery of the kinds of niche shade you could only see in hypothetical future AI/ML products like OpenAI GPT-Hawaii 5o, Google Switch Transformer 2: Dark of the Moon, or Microsoft’s Turing-NLG 3825/QB4ABCDEFG. Still, given Moore’s Law, how far out could that be?

t’s hard for most people to visualise just how big a jump we took in terms of parallel processing and the size of AI/ML models

So, using today’s non-hypothetical technologies, our commercially available ML models leverage complex algorithms and support vector machines to help classify data, use neural networks to identify patterns and consider hundreds of billions of parameters on a massive scale.

Their predictions give sports betting operators better odds, sometimes quickly enough for us to provide our customers with real-time prop betting opportunities. They can look at massive numbers of transactions, identify clusters or anomalies within the data, and characterise it as potential fraud as part of business process automation. But here’s the thing. With few exceptions, competitors in every industry are looking at the same sets of data. Sports data may be owned and licensable, but it exists in public. AI and ML leverage frameworks that are used widely across all fields.

Comic-book superheroes are what they are because they are exceptional by definition. With the commoditisation of the massive power that AI/ML could give every company, any of them could get a shot at membership in the Avengers.

But that’s not the case. In the upcoming SCCG Research brief on AI and ML, we’ll cover why that is, using sports data and fraud prevention as the framework for the discussion. We’ll also discuss how these powerful capabilities are driving operational change in our industry and what to consider when evaluating them for fitness of purpose.

Share the Post:

College football odds: The 4 best futures bets to make now

It's never too early to bet on next season's college football national champions! Jason McIntyre offers up his four best futures bets.

Rockies vs. Marlins Prediction, Odds, Picks – August 26

Preview the August 26 matchup between the Colorado Rockies and Miami Marlins with odds over/under, game spread, betting lines and more.

SCCG Research: Machine learning, AI in sports data and fraud detection

Related Posts

College football odds: The 4 best futures bets to make now

Rockies vs. Marlins Prediction, Odds, Picks – August 26