In this group project, we developed machine learning models to predict the total weight of fish catches from Norwegian fishing vessels. Using a dataset containing over 300,000 fishing operation reports from 2018, we implemented and compared various regression models including Linear Regression, LASSO, Ridge, Random Forest, and Neural Networks (MLP).
Norwegian fishing vessels are required to report detailed information about their fishing operations. This creates a rich dataset including vessel specifications, location data, and catch details. Our goal was to use this data to create accurate predictions of catch weights, which could help in fleet management and resource planning.
The raw data required significant preprocessing:
We implemented several machine learning approaches, progressively improving our predictions:
# Example of our Ridge Regression implementation
ridge_reg = make_pipeline(MinMaxScaler(),
Ridge(alpha=0.001, random_state=42))
ridge_reg.fit(X_train, y_train)
# Our best performing model: Random Forest
forest = RandomForestRegressor(random_state=42)
forest.fit(X_train, y_train)
forest_predicted_values = forest.predict(X_val)
Our model comparison showed clear differences in prediction accuracy:
The Random Forest model significantly outperformed other approaches, explaining 74% of the variance in catch weights. This suggests that non-linear relationships in the data are important for accurate predictions.
We also performed unsupervised learning using K-means clustering to identify patterns in fishing operations.
The elbow method is a technique used to determine the optimal number of clusters in a dataset by plotting the within-cluster sum of squares (WCSS) against the number of clusters.
The elbow method is subjective but it looked like 2 might be the pivot point.
Using PCA for dimensionality reduction, we identified two distinct clusters in the data: