Help Predict the Future of AI in Software Development!

Ever wanted to share your ideas about AI and have a chance at winning prizes at the same time? As a company dedicated to creating the best possible solutions for software development, we at JetBrains want to know what you think about AI in software development. In this post, we tell you more about the […]

Jun 2, 2025 - 22:50

Help Predict the Future of AI in Software Development!

Ever wanted to share your ideas about AI and have a chance at winning prizes at the same time? As a company dedicated to creating the best possible solutions for software development, we at JetBrains want to know what you think about AI in software development.

Participate in our tournament!

In this post, we tell you more about the tournament and offer tips for making accurate predictions. If you’re new to forecasting platforms, there’s a detailed overview of how they work below.

Let’s get started so that you can add your voice to community-sourced forecasting!

JetBrains Research’s AI in Software Development 2025 tournament

To participate in the tournament, all you have to do is register on Metaculus and complete this short survey .

Make sure to input your predictions before the resolution on December 1, 2025!

Tournament specs

With this forecasting challenge, we are primarily interested in seeing how accurately participants can predict emerging AI features in software development.

We also want to understand:

Developers’ attitudes about AI and how they are evolving
Individual features of the best forecasters
How people estimate the future of various benchmarks

Currently, the tournament includes 13 questions. To keep everything fair, we have invited independent experts to review the questions and to evaluate the end resolutions. These experts are:

Olga Megorskaya, Chief Executive Officer at Toloka
Grigory Sapunov, Co-Founder and CTO at Intento
Iftekhar Ahmed, Associate Professor at the University of California, Irvine
Hussein Mozannar, Senior Researcher at Microsoft Research AI Frontiers
Dmitiry Novakovskiy, Head of Customer Engineering at Google Cloud

Rankings and the prize pool

In this tournament, your ranking will be calculated based on your peer score.

Generally speaking, a positive score indicates higher accuracy, and a negative score lower (how exactly Metaculus calculates the peer score). A bit more specifically, the ranking is calculated from the sum of your peer scores over all the questions, which are individually weighted. That is, if you do not forecast a specific question, you score zero on that question.

For the AI in Software Development 2025 Tournament, we have a USD 3,000 prize pool, which will be distributed across the first three leaderboard medals as follows (all prizes in USD):

First place: $1,500
Second place: $1,000
Third place: $500

Note that in order to be eligible for the prize pool, you must fill out the quick research survey!

Tips for making accurate predictions on forecasting platforms

Here are some tips to get you on the path to positive peer scores and higher rankings:

Consider alternative scenarios before placing your forecast. This is generally a good idea, but also very useful if the event concerns something novel or very uncertain.
Ongoing news can inform the probabilities of different outcomes, so stay informed!
Be careful of being overconfident. Besides considering alternatives, it is useful to list offline the reasons why your forecast could be wrong.
As with many skills, practicing helps. Especially on a platform like Metaculus, when practicing forecasting, you can improve by posting your reasoning in the discussion section and reading about other participants’ reasoning.
If you have forecasted a few questions as practice, compare your track record with the community track record. (But don’t only predict based on the community median. Your insights and evidence are valuable, too!)

For more resources, check out Metaculus’ collection of analysis tools, tutorials, research literature, and tips, as well as their forecasting guide for each type of question.

Online forecasting tools: a primer

What are online forecasting tools? Via a combination of user inputs and sophisticated statistical modelling, these tools enable the prediction of future events.

If you’ve never heard of forecasting platforms before, you might guess that they are like gambling sites. While there are some similarities with betting, online forecasting tools are not strictly synonymous with gambling, whether online or at the tracks. A crucial difference is that forecasting tools are used by people interested in gathering information about future events, not necessarily (or solely) to gain a profit based on the outcome of a future event. In particular, our forecasting tournament focuses on evaluating the prediction skills of participants; the prizes are merely perks for the top-ranked forecasters and an exception to most queries on the hosting platform Metaculus.

Another type of information-gathering tool is a poll or a survey. While similar in empirical intent, the questions in polls often ask about participants’ (a) experiences, (b) ideas, or (c) preferences and not about tangible, objective facts that can be unambiguously resolved. Here are some real-world examples from YouGov (UK): (a) whether the participants have watched political content on TikTok, (b) participants’ views on banning phones in schools, and (c) which Doctor Who version the participant prefers.

While there might be a clear winner among the respondents, the results will reflect people’s preferences and thoughts, sometimes about facts, but the results are not facts themselves. Likewise, any survey results are subject to differences among varying demographics.

For the survey question (b), there is a clear winner in the results below, but this is only the opinion of the people in the UK who were asked. And while the respondent may be interested in the results (e.g. they really want schools to ban phones), there is no direct gain for having given a more popular or more accurate response.

Source: YouGov plc, 2025, © All rights reserved. [Last access: May 22, 2025]

In contrast, a forecasting query’s responses are evaluated for accuracy against facts at the time of resolution. Those participating are actively interested in the resolution, as it affects leaderboard prestige and/or financial reward, depending on the type of forecasting platform. This also means that participants are more motivated to give what they think are accurate predictions, even if it does not 100% align with their personal preferences at the time.

Often forecasting platforms involve binary questions, like in Will DeepSeek be banned in the US this year?. The queries can also be about uncertain events with multiple possible outcomes, e.g. for the winner of Eurovision 2025, where until the finals, many countries have a chance. Similarly, queries with numerical ranges, such as in the prediction of the Rotten Tomatoes score of Mission: Impossible – The Final Reckoning, can consider the weight of different ranges. Even if different platforms’ architectures handle the calculations slightly differently, the main takeaway is that there are resolution deadlines and that the event in question can be unambiguously resolved on forecasting platforms. See the figure below for a snapshot of the rules summary for the Mission: Impossible question on Kalshi.

Source: Kalshi. [Last access: May 22, 2025]

The following subsections present the history of forecasting tools, including the most common kinds and which one is relevant for this forecasting challenge.

A history of prediction

Forecasting mechanisms have existed informally for centuries, where people could predict outcomes like papal or presidential election results. More formal forecasting tools were established at the end of the 20th century, starting with a similar focus, and have since gained currency while expanding their application.

Well-known examples of formal forecasting mechanisms include the Iowa Electronic Market, created as an experimental tool in 1988 for the US presidential election’s popular vote, still in use today; Robin Hanson’s paper-based market, created in 1990 for Project Xanadu employees to make predictions on both the company’s product and scientific controversies; and the online Hollywood Stock Exchange, established in 1996 as a way for participants to bet on outcomes in the entertainment industry.

These forecasting tools demonstrated how much more accurate aggregated predictions can be than individual ones (see for example The Wisdom of Crowds or Anatomy of an Experimental Political Stock Market), motivating economists to take their insights seriously. Around the same time, big companies such as Google, Microsoft, and Eli Lily began establishing company-internal prediction markets. These days, many companies have their internal prediction tools; for example, we at JetBrains recently launched our own platform, called JetPredict.

For example, Google’s internal product, Prophit, was launched in 2005 and offered financial incentives, plus leaderboard prestige, to employees best at predicting. Although an internal product, Prophit was known outside of Google as a prediction platform demonstrating relatively high accuracy. It eventually had to shut down in the late 2000s, due to federal regulations (and the 2008 financial crisis did not help either). Many publications covered this topic at the time, for example this 2005 NYTimes article At Google, the Workers are Placing their Bets, this 2007 Harvard Business Case Study Prediction Markets at Google, and the 2008 article Using Prediction Markets to Track Information Flows: Evidence from Google. More recently, there was an article about Prophit and a second internal market, Gleangen: The Death and Life of Prediction Markets at Google.

Beyond big corporations, researchers have started using formal prediction tools to predict things like study replicability, a crucial scientific tenet. In a comparison of forecasting tools and survey beliefs predicting this, the former were much more accurate than the latter. If you are interested, The Science Prediction Market Project provides a collection of papers on the topic.

Applying forecasting tools to research is still less widespread than forecasting in the business world, but it’s an exciting space to watch!

Different forecasting tools today

Not all forecasting platforms are prediction markets, even if the terms are sometimes used interchangeably. Here we only look at overall differences without going into detail of, say, kinds of prediction markets or the math behind the models. If you are interested, here are further resources on these differences provided by WIFPR, Investopedia, and the Corporate Finance Institute.

The hallmark of a prediction market is that participants are offered financial incentives by way of event contracts, sometimes also called ‘shares’. Key concepts include:

The event contracts can be sold or bought depending on the participant’s belief in the outcome.
The current price reflects what the broader community expects of the outcome.
- As the nominal contract values are typically USD 1, the sum of the share prices is USD 1 as well. So, for a market’s implied probability of about 60%, the average share price to buy will be around 60 cents.
- Prices change in real-time as new information emerges.
If the participant bought contract shares for the correct prediction, they earn money (USD 1 typically) for each share purchased. Incorrect predictions mean no money is earned.

Translating those concepts into an example: A question on the prediction market Kalshi asks whether Anthropic will release Claude 4 before June 1, 2025. At the time of writing this post, the likelihood of Claude 4’s release was at 34% according to the community, as shown in the figure below.

Source: Kalshi. [Last access: May 16, 2025, 17:25 CEST]

If you wanted to participate in the above market on May 16, the following scenarios could have occurred. If you believed the release would have happened before June 1, you could have bought shares for about 35 cents each. Say you bought 100 shares for USD 35 and, come June 1, Anthropic did indeed release Claude 4. You would then have won USD 100 (USD 1 multiplied by 100 shares), and your profit would be USD 65 (USD 100 win minus your USD 35 investment). If Anthropic did not release Claude 4 by June 1, you would then have lost your initial USD 35 investment.

The figure above additionally shows that earlier in the year, the community thought that Claude 4 was more likely to be released by the resolution date. As more evidence rolls in, the outcome’s likelihood can change.

Aggregating community forecasts is also possible without share-buying and profit-seeking. Other forecasting platforms, such as Good Judgement or Metaculus, use a broader toolset for their prediction architecture, focusing primarily on leveraging collective intelligence and transparent scoring. By eliminating profit as the primary incentive and instead rewarding forecasters for their prediction accuracy over time, extreme predictions are discouraged.

In particular, Metaculus is building a forecasting ecosystem with a strong empirical infrastructure, using techniques such as Bayesian statistics and machine learning. This creates a platform that is overall more cooperative and has a shared scientific intent. The platform encourages participants to publish the reasoning behind their picks, which fosters community discussions.

Accuracy and the broader impact of community-sourced forecasting

As forecasting tools become more sophisticated, they are also getting more accurate in their predictions. In its current state, Metaculus already outperforms notoriously robust statistical models, as was recorded in Forecasting skill of a crowd-prediction platform: A comparison of exchange rate forecasts. The platform additionally keeps an ongoing record of all resolved questions with performance statistics.

Metaculus is a platform that not only benefits from community inputs, but also provides vital information to the community. Take the COVID-19 pandemic for example: predictors on Metaculus accurately anticipated the impact of the virus before it was globally recognized as a pandemic. In turn, the insights on specific events within such a pandemic can be valuable to policymakers, like in this case study on an Omicron wave in the US.

Researchers are continuously investigating various public health threats. An open question at the time of writing, on the possibility of the avian influenza virus becoming a public health emergency, is shown in the figure below. What would be your prediction?

Source: Metaculus. [Last access: May 16, 2025]

At JetBrains, our commitment goes beyond delivering top-tier software development solutions and innovative AI tools: We are passionate about nurturing a vibrant, engaged community and creating meaningful opportunities for learning and collaboration. We believe that open dialogue about the future of AI in software development is essential to advancing the field.

With these shared values, we are proud to partner with Metaculus as the host for our forecasting challenge. Together, we look forward to inspiring thoughtful discussion, driving progress, and shaping the future of AI in software development.