I managed to get through this https://medium.com/@skillcate/detecting-fake-news-with-a-bert-model-9c666e3cdd9b.
Here are corrections I made to "make it work" (to quote Tim Gunn).
1. In the first chunk, I made the following change:
(This works) from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
(This does not work) # "from sklearn.metrics import plot_confusion_matrix" , which was shown in the post is deprecated (I am writing this in October 2023).
plot_confusion_matrix does not work.
This BERT for dummies post is dated about an year ago, and yet Scikit-learn does not like plot_confusion_matrix anymore...
2. Before I ran the first chunk, I created a folder and directory in my Google Drive--I am using Google Colab--, so that the followinig cd works.
%cd /content/drive/MyDrive/1_LiveProjects/Project11_FakeNewsDetection
2. In the second chunk, the first two lines to load data did not work at first try.
Of course because I did not have these csv files, duh!
First, you have to download the files from Kaggle. Also, "a1_True.csv" and "a2_Fake.csv" do not work unless you renamed Kaggle datasets. You rather need:
true_data = pd.read_csv('True.csv')
fake_data = pd.read_csv('Fake.csv')
3. The fine tuning segment that begins with "# Train and predict" take time. In my case, it took over an hour.
4. The segment that builds a classification report, which begins with "# load weights of best model" gives an error.
Specifically this line does not work:
# path = 'c1_fakenews_weights.pt'
because in the precedeing segment, the path was given in a different name, "c2_new_model_weights.pt."
So, I rather have to write:
path = 'c2_new_model_weights.pt'