I was finishing my Formula 1 Cornering Performance Breakdown project when I came up with the idea of using the existing code infrastructure to generate cornering data for every track on the 2024 calendar. As part of that project, I tagged every corner with a tag highlighting how fast that corner is. With that information, I can determine how much time is spent on each type of corner. The remainder of the time in lap, which isn’t spent negotiating a corner, is the time the drivers spend at full throttle on a lap, tagged below as STRAIGHT.

Corner type LOW MEDIUM-LOW MEDIUM-HIGH HIGH STRAIGHT
Corner apex speed <100kph 100kph-150kph 150kph-200kph >200kph Full throttle

The data Link to heading

Now that we’ve gone over the basics, let’s take a moment to understand how to interpret the data that will be shown in this blog post. Take a look at the example below:

                   STRAIGHT     LOW  MEDIUM-LOW  MEDIUM-HIGH    HIGH
Azerbaijan GP        51.201  24.801      24.201        0.000   0.000
Qatar GP             29.035   6.520      10.301       16.001  21.921

We can see that the Azerbaijan GP is dominated by low and medium-low-speed corners, whereas medium-high and high-speed corners dominate the Qatar GP. This makes complete sense and matches our preconceptions of these two tracks: Azerbaijan is a street track, and Qatar is a flowing MotoGP track.

Without further ado, here’s the data for all 24 GPs of the 2024 F1 calendar:

                   STRAIGHT     LOW  MEDIUM-LOW  MEDIUM-HIGH    HIGH
Bahrain GP           37.177  23.139      16.394        4.899   8.099
Saudi Arabian GP     44.187   7.720       6.880       19.399  10.079
Australian GP        31.573   4.516      14.603        6.166  19.874
Azerbaijan GP        51.201  24.801      24.201        0.000   0.000
Miami GP             39.961  35.099       6.654        0.000   5.100
Monaco GP            24.805  29.750      10.180        6.630   0.000
Spanish GP           30.637   0.000      17.951       14.060   9.624
Austrian GP          34.312   6.522       9.658        2.220  11.679
British GP           39.236   4.900      18.582        4.320  19.682
Hungarian GP         27.946   5.800      27.523       10.561   4.779
Italian GP           44.692   9.579       6.516       13.027   6.480
Singapore GP         33.632  22.850      27.577        4.119   2.806
Japanese GP          42.847  16.477       5.660       15.734   8.159
Qatar GP             29.035   6.520      10.301       16.001  21.921
United States GP     35.539  32.403      10.903        2.963  12.915
Mexico City GP       27.789  25.319      13.496       10.562   0.000
São Paulo GP         33.454   6.120      19.887        7.080   3.480
Las Vegas GP         49.586  26.140      13.200        0.000   3.800
Abu Dhabi GP         37.663   8.720      17.381       13.180   6.501
Chinese GP           35.122  33.319       8.646        6.646   7.814
Dutch GP             26.664   7.822      22.118        5.655   8.083
Belgian GP           46.945  17.060       7.678       21.639  10.343
Canadian GP          34.337  14.883      21.020        0.000   0.000
Emilia Romagna GP    38.293   0.000      21.866       11.245   3.007

This is a good starting point, but before we start running clustering algorithms on this data, we should normalize the data in relation to the lap time of each track. Doing this prevents tracks like Spa from having more time spent in corners simply because it has a higher lap time. The Python code below does just that by normalizing the data to a 100-second (1m40s) lap time:

def normalize(dt : pd.DataFrame) -> pd.DataFrame:
    res = dt.copy(True)
    res["Total"] = res["STRAIGHT"] +\
                   res["LOW"] +\
                   res["MEDIUM-LOW"] +\
                   res["MEDIUM-HIGH"] +\
                   res["HIGH"]
    for c in ("STRAIGHT", "LOW", "MEDIUM-LOW", "MEDIUM-HIGH", "HIGH"):
        res[c] = res[c] * 100 / res["Total"]
    res = res.drop(["Total"], axis=1)
    return res

The normalized data:

                    STRAIGHT        LOW  MEDIUM-LOW  MEDIUM-HIGH       HIGH
Bahrain GP         41.442235  25.793686   18.274847     5.461051   9.028180
Saudi Arabian GP   50.061746   8.746389    7.794709    21.978134  11.419022
Australian GP      41.147109   5.885419   19.031173     8.035761  25.900537
Azerbaijan GP      51.097273  24.750756   24.151971     0.000000   0.000000
Miami GP           46.030594  40.430115    7.664662     0.000000   5.874629
Monaco GP          34.757935  41.687102   14.264696     9.290268   0.000000
Spanish GP         42.391244   0.000000   24.838112    19.454284  13.316360
Austrian GP        53.286950  10.128745   14.998991     3.447687  18.137628
British GP         45.244465   5.650369   21.427583     4.981550  22.696033
Hungarian GP       36.478743   7.570912   35.926588    13.785587   6.238170
Italian GP         55.660448  11.929908    8.115177    16.224126   8.070341
Singapore GP       36.964741  25.114306   30.309725     4.527170   3.084059
Japanese GP        48.209323  18.539105    6.368352    17.703118   9.180103
Qatar GP           34.657070   7.782473   12.295591    19.099286  26.165580
United States GP   37.518871  34.208165   11.510404     3.128068  13.634492
Mexico City GP     36.011974  32.811083   17.489568    13.687375   0.000000
São Paulo GP       47.777095   8.740235   28.401480    10.111252   4.969938
Las Vegas GP       53.475832  28.190583   14.235490     0.000000   4.098095
Abu Dhabi GP       45.135119  10.449997   20.829289    15.794835   7.790760
Chinese GP         38.364993  36.395513    9.444329     7.259659   8.535506
Dutch GP           37.906230  11.119957   31.443519     8.039294  11.491001
Belgian GP         45.285294  16.456856    7.406550    20.873969   9.977331
Canadian GP        48.885251  21.188781   29.925968     0.000000   0.000000
Emilia Romagna GP  51.461477   0.000000   29.385440    15.112013   4.041069

Time to run the k-means clustering method Link to heading

Now that we have our data, we can run k-means clustering on it. I decided to go for 4 clusters, even though the elbow method only suggested 2.

def kmeans_clustering(dt : pd.DataFrame, n : int = 2) -> Dict[int, Tuple[pd.DataFrame, List[str]]]:
    kmeans = KMeans(n_clusters=n, n_init='auto')
    kmeans.fit(dt)

    dt1 = dt.copy(True)

    dt1["Cluster"] = kmeans.labels_
    
    res = {}
    for i in range(n):
        gp_cluster = dt1[dt1["Cluster"] == i]
        print(gp_cluster)
        print(list(gp_cluster.index))
        print()
        res[i] = (gp_cluster, list(gp_cluster.index))
    
    return res

Cluster 1: The street tracks with high top speeds Link to heading

Every track on this cluster is dominated by low and medium-low-speed corners. The length of the straights on these circuits (except Singapore) makes top speed a priority, which conflicts with the requirements of the low-speed corners.

                STRAIGHT        LOW  MEDIUM-LOW  MEDIUM-HIGH      HIGH  Cluster
Azerbaijan GP  51.097273  24.750756   24.151971      0.00000  0.000000        0
Singapore GP   36.964741  25.114306   30.309725      4.52717  3.084059        0
Las Vegas GP   53.475832  28.190583   14.235490      0.00000  4.098095        0
Canadian GP    48.885251  21.188781   29.925968      0.00000  0.000000        0

['Azerbaijan GP', 'Singapore GP', 'Las Vegas GP', 'Canadian GP']

Cluster 2: The flowing high-speed racetracks Link to heading

Every track on this cluster (except Monza) could be a MotoGP track, looking purely at the track layouts. This set of tracks rewards teams that can put a ton of downforce while maintaining a respectable top speed.

                   STRAIGHT        LOW  MEDIUM-LOW  MEDIUM-HIGH       HIGH  Cluster
Saudi Arabian GP  50.061746   8.746389    7.794709    21.978134  11.419022        1
Australian GP     41.147109   5.885419   19.031173     8.035761  25.900537        1
Austrian GP       53.286950  10.128745   14.998991     3.447687  18.137628        1
British GP        45.244465   5.650369   21.427583     4.981550  22.696033        1
Italian GP        55.660448  11.929908    8.115177    16.224126   8.070341        1
Japanese GP       48.209323  18.539105    6.368352    17.703118   9.180103        1
Qatar GP          34.657070   7.782473   12.295591    19.099286  26.165580        1
Belgian GP        45.285294  16.456856    7.406550    20.873969   9.977331        1

['Saudi Arabian GP', 'Australian GP', 'Austrian GP', 'British GP',
 'Italian GP', 'Japanese GP', 'Qatar GP', 'Belgian GP']

The traction-sensitive racetracks Link to heading

You will struggle at these tracks if your car has poor handling characteristics. This cluster is dominated by low-speed corners, highlighting the suspension quality and ride height.

                   STRAIGHT        LOW  MEDIUM-LOW  MEDIUM-HIGH       HIGH  Cluster
Bahrain GP        41.442235  25.793686   18.274847     5.461051   9.028180        2
Miami GP          46.030594  40.430115    7.664662     0.000000   5.874629        2
Monaco GP         34.757935  41.687102   14.264696     9.290268   0.000000        2
United States GP  37.518871  34.208165   11.510404     3.128068  13.634492        2
Mexico City GP    36.011974  32.811083   17.489568    13.687375   0.000000        2
Chinese GP        38.364993  36.395513    9.444329     7.259659   8.535506        2

['Bahrain GP', 'Miami GP', 'Monaco GP',
 'United States GP', 'Mexico City GP', 'Chinese GP']

The all-rounders Link to heading

I have nothing interesting to say about these tracks besides that you need a good car at both medium-low and medium-high-speed corners.

                    STRAIGHT        LOW  MEDIUM-LOW  MEDIUM-HIGH       HIGH  Cluster
Spanish GP         42.391244   0.000000   24.838112    19.454284  13.316360        3
Hungarian GP       36.478743   7.570912   35.926588    13.785587   6.238170        3
São Paulo GP       47.777095   8.740235   28.401480    10.111252   4.969938        3
Abu Dhabi GP       45.135119  10.449997   20.829289    15.794835   7.790760        3
Dutch GP           37.906230  11.119957   31.443519     8.039294  11.491001        3
Emilia Romagna GP  51.461477   0.000000   29.385440    15.112013   4.041069        3

['Spanish GP', 'Hungarian GP', 'São Paulo GP', 
 'Abu Dhabi GP', 'Dutch GP', 'Emilia Romagna GP']

Final remarks Link to heading

There are multiple ways to improve the clustering of the racetracks: the two I can think of from the top of my head are adding the average speed and the longest continuous throttle-on time (i.e., longest straight) as two new columns. Calculating the isochronal ratio would also be interesting, but generating that data would be far too time-consuming for me.

If you also have ideas to iterate on this project, you are more than welcome to do it! The code is available on GitHub.