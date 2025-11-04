The following lines define the loss function used to evaluate the model’s performance, with binary cross-entropy (BCE) loss being a suitable choice for binary classification problems like spam detection. We can then initialize the Adam optimizer, which is used to update the model’s parameters during training, with a learning rate of 0.001.

The learning rate helps ensure that the model learns enough from training to make meaningful adjustments to its parameters while also not overcorrecting. A learning rate of 0.001 is a common default because it provides a good balance between convergence speed and stability.

criterion = nn.BCELoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

To check whether a CUDA-compatible GPU is available and to set the device variable to either “cuda” or “cpu,” we can run the following code. After allowing the model to use the GPU for accelerated computations, if possible, we then move the model to the specified device. This step ensures that all subsequent computations and data are processed on the chosen hardware, whether it be the GPU or CPU. Note, a GPU is not required to run this notebook as-is.

device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

model.to(device)

Output:

SpamClassifier( (embedding): Embedding(22754, 64, padding_idx=0) (lstm): LSTM(64, 64, batch_first=True) (fc): Linear(in_features=64, out_features=1, bias=True) (sigmoid): Sigmoid() )

We are now ready to train the mode for 5 epochs, or iterations. The model is set to training mode at the beginning of each epoch by using model.train() . The training data is then iterated over in batches by using the train_dl data loader, with each batch being transferred to the specified device (GPU or CPU) for processing. For each batch, the model clears accumulated gradients, makes predictions about the input and calculates the loss by using the criterion . With loss.backward() , the model computes the gradients, then updates the model parameters by using the optimizer , and increments the train_loss . The average loss is calculated and printed at the end of each epoch, providing a measure of the model’s performance during training.

epochs = 5

train_losses = []

for epoch in range(epochs):

model.train()

train_loss = 0

for X, y in train_dl:

X, y = X.to(device), y.to(device)

optimizer.zero_grad()

preds = model(X)

loss = criterion(preds, y)

loss.backward()

optimizer.step()

train_loss += loss.item()

train_losses.append(train_loss/len(train_dl))

print(f”Epoch {epoch+1} | Loss: {train_loss/len(train_dl):.4f}”)

Output:

Epoch 1 | Loss: 0.4563 Epoch 2 | Loss: 0.2659 Epoch 3 | Loss: 0.2312 Epoch 4 | Loss: 0.1316 Epoch 5 | Loss: 0.0936

As you can see, the loss is significantly dropping with each epoch. Let’s plot the loss over the number of epochs as a visualization of these results.

epochs = np.arange(1, len(train_losses) + 1)

plt.figure(figsize=(12, 5))

plt.plot(epochs, train_losses, label=’Training Loss’, color=’blue’)

plt.title(‘Training loss over epochs’)

plt.xlabel(‘Epoch’)

plt.ylabel(‘Training loss’)

plt.legend()

plt.show()

Output: