Disclaimer: I am a fairly new to the library and this is just to show what I've observed so far, I may be missing a point or two, so, don't treat the above as a comprehensive list of what fastai can do. This is just my attempt to keep learning and evolving.

Introduction

Below is generally the plan that everyone follows when it comes to training a Machine Learning model:

  • Load Data
  • Inspect Data: Plot a few examples
  • Create a DataLoader
  • Define a model architecture.
  • Write a training loop.
  • Plot metrics.

There's also another step, which is

  • Analyze Errors.

but we'll tackle this in a separate blog post once we've covered the training of the model bit.

PyTorch Version

from torch import nn
from torch import optim
from torch.utils.data import dataset, dataloader
from torch.autograd import Variable
from torch.nn import functional as F
from torchvision import datasets, transforms
from torchvision.models.resnet import resnet18
from tqdm.notebook import tqdm
from sklearn.model_selection import train_test_split

import os
from collections import Counter, OrderedDict
import re
import requests
import tarfile

Fetch Data

The data is a tar-gzip archive, to extract data files from it we'll use the package called tarfile. But first we need to download the archive.

The code in the cell below is taken form: https://gist.github.com/devhero/8ae2229d9ea1a59003ced4587c9cb236#gistcomment-3775721.

def fetch_data(url, data_dir, download=False):
    if download:
        response = requests.get(url, stream=True)
        file = tarfile.open(fileobj=response.raw, mode="r|gz")
        file.extractall(path=data_dir)

In the interest of comparison, I'll first write the Dataset class and see how easy it gets when we use fastai. The url that we want to fetch the data from is here: Pets Dataset

pets_url = 'https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet.tgz'
data_dir = os.path.join('gdrive', 'MyDrive', 'pets_data')
base_img_dir = os.path.join(data_dir, 'oxford-iiit-pet', 'images')
fetch_data(pets_url, data_dir)

We've extracted the data in the folder named pets_data, on inspection, it looks like the folder pets_data/oxford-iiit-pet/images contains all the images we want (some files need to be filtered out as they're not in JPEG format). The filenames have the category labels in their name itself in the format: <CATEGORYNAME>_<NUMBER>.jpg.

Extract Labels

In order to extract the category name from the file names in fastbook, a RegexLabeller is used. We'll write a similar LabelExtractor (although it's very inferior in functionality to the fastai's RegexLabeller but does the job for now).

class RegexLabelExtractor():
    def __init__(self, pattern):
        self.pattern = pattern
        self._names = []
    
    def __call__(self, iterable):
        return [re.findall(self.pattern, value)[0] for value in iterable]

As mentioned before our version, RegexLabelExtractor extracts the label given a text. It accepts a pattern during instantiation and on __call__ it expects an iterable containing a list of texts containing the labels. It returns all the label names in a Python list

Once we've defibed a class for extracting the labels, we'd like to define a container that is responsible for maintaining a map of CATEGORYNAME -> ID, which we'll use to convert the labels to an integer format and vice versa.

Below we define a LabelManager, it exposes a id_for_label* and label_for_id* methods along with keys, which returns the unique label names in our dataset (this our vocabulary size). We can also call len on a LabelManager object to know the number of output classes.

*These are a type of OrderedDict.

class LabelManager():
    def __init__(self, labels):
        self._label_to_idx = OrderedDict()    
        for label in labels:
            if label not in self._label_to_idx:
                self._label_to_idx[label] = len(self._label_to_idx)
        self._idx_to_label = {v:k for k,v in self._label_to_idx.items()}
    
    @property
    def keys(self):
        return list(self._label_to_idx.keys())
    
    def id_for_label(self, label):
        return self._label_to_idx[label]
    
    def label_for_id(self, idx):
        return self._idx_to_label[idx]
    
    def __len__(self):
        return len(self._label_to_idx)

Data Splitter

We'd also like to spilt our dataset into train and validation subsets. Although the dataset provides a list of train and validation splits but to be consistent with the book, we'll just write our version of the RandomSplitter (which again would be very inferior in functionality, but will do the job for the purposes of demonstration).

We'd like this Splitter to accept a percentage to split on and also a seed for reproducibility.

class Splitter():
    def __init__(self, valid_pct=0.2, seed = None):
        self.seed = seed
        self.valid_pct = valid_pct
    
    def __call__(self, dataset):
        return train_test_split(dataset, test_size=self.valid_pct, random_state=np.random.RandomState(self.seed))

Writing a PyTorch Dataset

Now that we have a way to extract labels, maintain them in a map and split the data into train and validation splits, we'll define a PetsDataset ( a PyTorch Dataset) which will be used by the PyTorch DataLoaderto give us the data we need to provide our model to train.

A note on PyTorch Dataset: A PyTorch dataset is a primitive provided by the library that stores the samples and their corresponding labels. In order to write a custom dataset, our class PetsDataset needs to implement three functions: __init__, __len__, and __getitem__.

class PetsDataset(dataset.Dataset):
    def __init__(self, data, tfms=None):
        super(PetsDataset, self).__init__()
        self.data = data
        self.transforms = tfms
    
    def __getitem__(self, idx):
        X = Image.open(self.data[idx][0])
        if X.mode != 'RGB':
            X = X.convert('RGB')
        y = self.data[idx][1]
        if self.transforms:
            X = self.transforms(X)
        return (X, y)
    
    def __len__(self):
        return len(self.data)
    

Notice how we're opening the Image only when __getitem__ is called and we also have to make sure that all the images have 3 input channels, hence the check if X.mode != 'RGB'. Some images in the dataset have this issue and if we don't convert them to have 3 input channels then the DataLoader wouldn't be able to create a batch using torch.stack

We're now ready to use these datasets, but we'll need to make sure that our global map of CATEGORYNAME -> ID is constructed using both the train and the validation splits, we'll also have this class hold our corresponding datasets.

class DatasetManager():
    
    def __init__(self, base_dir, paths, label_extractor, tfms=None, valid_pct=0.2, seed=None):
        self._labels = label_extractor(paths)
        self.tfms = tfms
        self._label_manager = LabelManager(self._labels)
        self._label_ids = [self.label_manager.id_for_label(label) for label in self._labels]

        self.abs_paths = [os.path.join(base_dir, path) for path in paths]
        self.train_data, self.valid_data = Splitter(valid_pct=valid_pct, seed=seed)(list(zip(self.abs_paths, self._label_ids)))
        
        
    @property
    def label_manager(self):
        return self._label_manager
    
    @property
    def train_dataset(self):
        return PetsDataset(self.train_data, tfms=self.tfms)

    @property
    def valid_dataset(self):    
        return PetsDataset(self.valid_data, tfms=self.tfms)
    

We'll now use all the helper classes we've created so far to use the datasets in a dataloader and look at the plan to choose an architecture and train it (almost there).

paths = [path for path in sorted(os.listdir(base_img_dir)) if path.endswith('.jpg')]
pattern = '(.+)_\d+.jpg$'
regex_label_extractor = RegexLabelExtractor(pattern)
dataset_manager = DatasetManager(base_img_dir, paths, regex_label_extractor, 
                                 tfms=transforms.Compose([transforms.Resize((224, 224)), transforms.ToTensor()]), 
                                 seed=42)
train_dataset = dataset_manager.train_dataset
valid_dataset = dataset_manager.valid_dataset

Before we look at the model, let's just for sake of sanity look at the labels we're dealing with and possibly plot a few images. This step is just to make sure that things are working as expected and the dataloader will be batching the data in the right way and our Trainer won't crash midway.

df = pd.DataFrame(dataset_manager.label_manager.keys, columns=['label_name'])
df.head(len(df))

label_name
0 Abyssinian
1 Bengal
2 Birman
3 Bombay
4 British_Shorthair
5 Egyptian_Mau
6 Maine_Coon
7 Persian
8 Ragdoll
9 Russian_Blue
10 Siamese
11 Sphynx
12 american_bulldog
13 american_pit_bull_terrier
14 basset_hound
15 beagle
16 boxer
17 chihuahua
18 english_cocker_spaniel
19 english_setter
20 german_shorthaired
21 great_pyrenees
22 havanese
23 japanese_chin
24 keeshond
25 leonberger
26 miniature_pinscher
27 newfoundland
28 pomeranian
29 pug
30 saint_bernard
31 samoyed
32 scottish_terrier
33 shiba_inu
34 staffordshire_bull_terrier
35 wheaten_terrier
36 yorkshire_terrier

Data Inspection

A method to plot one batch of data (inspired by fastai of course but again a very curtailed version of what that function does). Notice how we're calling transforms.ToPILImage(), that's because we have objects of type torch.Tensor in our batch and in order to plot them we need to convert them to a PIL.Image, rest everything is done to just make sure we've got the images aligned in a nice way across different panels.

def plot_one_batch(batch, max_images=9):
    nrows = int(math.sqrt(max_images))
    ncols = int(math.sqrt(max_images))
    if nrows * ncols != max_images:
        nrows = (max_images + ncols - 1) // ncols 
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(20, 10))
    X,Y = next(batch)
    for idx, x in enumerate(X[:max_images]):
        y = Y[idx]
        ax.ravel()[idx].imshow(transforms.ToPILImage()(x))
        ax.ravel()[idx].set_title(f'{y}/{dataset_manager.label_manager.label_for_id(y.item())}')
        ax.ravel()[idx].set_axis_off()
    plt.tight_layout()
    plt.show()

This function generates one batch of data given a dataloader, this is just using Python Generators

def generate_one_batch(dl):
    for batch in dl:
        yield batch
plot_one_batch(generate_one_batch(train_dl), max_images=20)

Model Architecture

Now, we're ready to look at the model and make a few decisions about the architecture we want to use.

Here's our requirement: We want to extract the features from an image and then uses a classification head to get the output distribution over the number of classes (our labels from before). We'll define a loss and use it to optimize the network.

Because we're dealing with images, a Convolution Neural Network (CNN) seems like a good start, in the literature as well as fastbook, a restnet type architecture has been used, so let's use that and see what we can do with it.

Coding a ResNet is a separate blog post on its own, so, we'll punt that for now and use what's available to us in the form a pretrained model.

model = resnet34(pretrained=True, progress=True)
Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth

Changing the classifier

Since this model is trained to give an output distribution for 1000 classes, we can just change that layer to give us an output distribution based on what we have in our dataset and then fine-tune this layer. To read more on fine-tuning refer to fastbook

model.fc = nn.Linear(512,len(dataset_manager.label_manager), bias=True)

Making the model fine-tunable

We'll freeze all the layers of the model except for the fc classification head we added above.

def make_fine_tunable(model):
    for param in model.parameters():
        param.requires_grad = False
    for param in model.fc.parameters():
        param.requires_grad = True
    print("Tunable Layers: ")
    for (name, param) in model.named_parameters():
        if param.requires_grad:
            print(f'{name} -> {param.requires_grad}')
make_fine_tunable(model)

Tunable Layers: 
fc.weight -> True
fc.bias -> True

Trainer

Now comes in a point where we have to write a traininig loop and this is where things get into the Boiler Plate code category even more. We shouldn't be writing this but that's the point of this blog post that using fastai we can offload a lot of the boiler plate code to the library and use the goodies offered by the library to our advantage and focus more on research/modeling.

We are maintaining an instance of model, criterion, an optimizer and dataloaders. We step through a batch during train_epoch and incur a loss. We use this loss to make a backward pass and let the optimizer take a step by updating the network parameters. We also have a validate function that calculates loss and accuracy on validation dataset after every epoch

class Trainer():
    def __init__(self, train_dataloader, model, criterion, optimizer, test_dataloader=None):
        self.train_dl = train_dataloader
        self.model = model
        self.test_dl = test_dataloader
        self.criterion = criterion
        self.optimizer = optimizer
        self.recorder = {'loss': {
            'train': {}, 'test': {}}
            , 'accuracy': {'train': {}, 'test': {}}}
    
    def step_batch(self, X,y):
        X = X.cuda()
        y = y.cuda()
        logits = self.model(X)
        loss = self.criterion(logits, y)
        probs = F.softmax(logits, dim=1)
        return loss, logits, probs
    
    def train_epoch(self, epoch):
        self.model.train()
        running_loss = 0
        for X,y in tqdm(self.train_dl, leave=False):
            self.optimizer.zero_grad()

            loss, _, _ = self.step_batch(X,y)
            running_loss += loss

            loss.backward()
            self.optimizer.step()
        
        epoch_loss = running_loss / len(self.train_dl)
        self.recorder['loss']['train'][epoch] = epoch_loss

        return epoch_loss

    @torch.no_grad()
    def accuracy(self):
        correct = 0
        total = 0

        for X,y in tqdm(self.test_dl):
            total += y.size(0)
            logits = model(X)
            probs = F.softmax(logits, dim=1)
            _, y_pred = torch.max(probs, dim=1)
            correct += (y_pred == y).sum()
        acc = correct / float(total)
        return acc


    @torch.no_grad()
    def validate(self, epoch):
        
        running_loss = 0
        total = 0
        correct = 0

        for X,y in tqdm(self.test_dl, leave=False):
            y = y.cuda()
            total += y.size(0)
            loss, logits, probs = self.step_batch(X,y)
            running_loss += loss
            _, y_pred = torch.max(probs, dim=1)
            correct += (y_pred == y).cpu().sum()
        acc = correct / float(total)
        epoch_loss = running_loss / len(self.test_dl)
        self.recorder['loss']['test'][epoch] = epoch_loss
        self.recorder['accuracy']['test'][epoch] = acc
        return epoch_loss, acc
    
    def train(self, num_epochs):
        for epoch in tqdm(range(num_epochs), leave=False):
            train_loss = self.train_epoch(epoch)
            test_loss, test_acc = self.validate(epoch)
            #print(f"Training Loss: {train_loss},\tTest Loss: {test_loss},\tTest Accuracy: {test_acc}")

Training (fine-tuning) the model

Let's send the model over to the GPU for faster training.

model = model.cuda()

Hyperparameters

Let's define a configuration that will hold our hyper-parameters

class TrainConfig():
    def __init__(self, bs=32, lr=1e-2, seed=42, betas=(0.9, 0.999), num_workers=4):
        self.bs = bs
        self.lr = lr
        self.seed = seed
        self.betas = betas
        self.num_workers = num_workers

We set the seed for reproducibility, and instantiate dataloader objects. Notice how we're using > 1 num_workers. That speeds up the data loading process.

config = TrainConfig(bs=128)
torch.manual_seed(config.seed)
train_dl = dataloader.DataLoader(train_dataset, batch_size=config.bs, shuffle=True, num_workers=config.num_workers)
valid_dl = dataloader.DataLoader(valid_dataset, batch_size=config.bs, shuffle=False, num_workers=config.num_workers)

We define our criterion as nn.CrossEntropy and choose our optimizer to be an instance of optim.Adam, after that we instantiate our trainer object and train (fine-tune in our case) for a few epochs.

Criterion, Optimizer and Training

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3, betas=(0.9, 0.999))
trainer = Trainer(train_dl, model, criterion, optimizer, test_dataloader=valid_dl)
trainer.train(10)

Plotting Utilities

Helper functions for plotting our loss and accuracies (which we've recorded using our trainer)

def plot_losses(losses):
    train_loss = losses['train']
    test_loss = losses['test']
    plt.style.use('fivethirtyeight')
    fig, ax = plt.subplots(figsize=(7, 4))
    ax.plot(train_loss, color='blue', label='Training Loss')
    ax.plot(test_loss, color='green', label='Test Loss')
    ax.set(title="Loss over epochs", xlabel="Epochs", ylabel="Loss")
    ax.legend()
    fig.show()
    plt.style.use('default')
def plot_accuracy(accuracy):
    plt.style.use('fivethirtyeight')
    fig, ax = plt.subplots(figsize=(7, 4))
    ax.plot(accuracy, color='blue', label='Test Accuracy')
    ax.set(title="Accuracy over epochs", xlabel="Epochs", ylabel="Accuracy")
    ax.legend()
    fig.show()
    plt.style.use('default')

Loss

losses = { k: np.asarray([t.item() for t in v.values()]) for k,v in trainer.recorder['loss'].items() }
plot_losses(losses)

Accuracy

accuracies = { k: np.asarray([t.item() for t in v.values()]) for k,v in trainer.recorder['accuracy'].items()}
plot_accuracy(accuracy=accuracies['test'])

Summary of PyTorch Version

  1. Load Data
  2. Inspect Data: Plot a few examples
  3. Create a DataLoader
  4. Define a model architecture.
  5. Write a training loop.
  6. Plot metrics.

Fast.ai Version

Now let's see how this can done using fastai. One could treat the following cells as a completely different notebook altogether.

from fastcore.all import L
from fastai.vision.all import *
matplotlib.rc('image', cmap='Greys')

Fetch Data

We'll first download the Pets data and untar it using the untar_data function and this function really takes care of filtering the images and storing them somewhere on the disk for us and then returning the paths. It's helpful as I don't have to take a peek at the response object and parse it then untar it, apply filters and then iterate through the directory. This function does it all for us. To know more about untar_data, please checkout the documentation for untar_data

path = untar_data(URLs.PETS)
Path.BASE_PATH = path
path.ls()

(#2) [Path('annotations'),Path('images')]
(path/"images").ls()

(#7393) [Path('images/great_pyrenees_160.jpg'),Path('images/shiba_inu_82.jpg'),Path('images/scottish_terrier_2.jpg'),Path('images/Russian_Blue_144.jpg'),Path('images/pomeranian_166.jpg'),Path('images/english_cocker_spaniel_48.jpg'),Path('images/japanese_chin_180.jpg'),Path('images/scottish_terrier_15.jpg'),Path('images/Sphynx_166.jpg'),Path('images/Maine_Coon_98.jpg')...]

Define DataBlock

Let's construct a DataBlock object. A DataBlock object provides us encapsulation over many aspects of our data loading and arranging pipeline. It let's us define the

  • blocks which make up for X and y in our dataset

    • This will also automtically convert the labels to integer ids
  • extract the label from the name attribute of the file
  • apply Transformations for us which can help us do Data Augmentation and resizing in one go.
  • Randomly split the data into training and validation splits.

Notice how it does all the work and more (we didn't do any augmentation) of the classes Splitter, RegexLabelExtractor, LabelManager, DatasetManager defined above in just one call, and since it's well maintained, offers us much more functionality, generic, more performamnt, well tested and maintained, we don't need to keep writing our own versions from scratch every time we are tasked with training a classifier.

pets = DataBlock(blocks = (ImageBlock, CategoryBlock),
                 get_items=get_image_files, 
                 splitter=RandomSplitter(seed=42),
                 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'),
                 item_tfms=Resize(460),
                 batch_tfms=aug_transforms(size=224, min_scale=0.75))

To check if everything will work fine, there's a pretty handy function called summary that we call on the DataBlock object that will show us the whole plan and will tell us a meningful error message if there's an issue with our pipeline somewhere.

Sanity Check

pets.summary(path/"images")

Setting-up type transforms pipelines
Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}

Building one sample
  Pipeline: PILBase.create
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/Persian_180.jpg
    applying PILBase.create gives
      PILImage mode=RGB size=334x500
  Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
    starting from
      /root/.fastai/data/oxford-iiit-pet/images/Persian_180.jpg
    applying partial gives
      Persian
    applying Categorize -- {'vocab': None, 'sort': True, 'add_na': False} gives
      TensorCategory(7)

Final sample: (PILImage mode=RGB size=334x500, TensorCategory(7))


Collecting items from /root/.fastai/data/oxford-iiit-pet/images
Found 7390 items
2 datasets of sizes 5912,1478
Setting up Pipeline: PILBase.create
Setting up Pipeline: partial -> Categorize -- {'vocab': None, 'sort': True, 'add_na': False}
Setting up after_item: Pipeline: Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
Setting up before_batch: Pipeline: 
Setting up after_batch: Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> RandomResizedCropGPU -- {'size': (224, 224), 'min_scale': 0.75, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'max_scale': 1.0, 'p': 1.0} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}

Building one batch
Applying item_tfms to the first sample:
  Pipeline: Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} -> ToTensor
    starting from
      (PILImage mode=RGB size=334x500, TensorCategory(7))
    applying Resize -- {'size': (460, 460), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (2, 0), 'p': 1.0} gives
      (PILImage mode=RGB size=460x460, TensorCategory(7))
    applying ToTensor gives
      (TensorImage of size 3x460x460, TensorCategory(7))

Adding the next 3 samples

No before_batch transform to apply

Collating items in a batch

Applying batch_tfms to the batch built
  Pipeline: IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} -> Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} -> RandomResizedCropGPU -- {'size': (224, 224), 'min_scale': 0.75, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'max_scale': 1.0, 'p': 1.0} -> Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False}
    starting from
      (TensorImage of size 4x3x460x460, TensorCategory([ 7, 22,  6, 14], device='cuda:0'))
    applying IntToFloatTensor -- {'div': 255.0, 'div_mask': 1} gives
      (TensorImage of size 4x3x460x460, TensorCategory([ 7, 22,  6, 14], device='cuda:0'))
    applying Flip -- {'size': None, 'mode': 'bilinear', 'pad_mode': 'reflection', 'mode_mask': 'nearest', 'align_corners': True, 'p': 0.5} gives
      (TensorImage of size 4x3x460x460, TensorCategory([ 7, 22,  6, 14], device='cuda:0'))
    applying RandomResizedCropGPU -- {'size': (224, 224), 'min_scale': 0.75, 'ratio': (1, 1), 'mode': 'bilinear', 'valid_scale': 1.0, 'max_scale': 1.0, 'p': 1.0} gives
      (TensorImage of size 4x3x224x224, TensorCategory([ 7, 22,  6, 14], device='cuda:0'))
    applying Brightness -- {'max_lighting': 0.2, 'p': 1.0, 'draw': None, 'batch': False} gives
      (TensorImage of size 4x3x224x224, TensorCategory([ 7, 22,  6, 14], device='cuda:0'))

Dataloader

Let's define our dataloader.

dls = pets.dataloaders(path/"images")

Training (fine-tuning) the model

Let's train our model for two epochs using cnn_learner, the model we'll use is resnet34

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(10)
epoch train_loss valid_loss error_rate time
0 1.492460 0.365994 0.117727 01:09
epoch train_loss valid_loss error_rate time
0 0.497617 0.326611 0.109608 01:14
1 0.303163 0.237988 0.077131 01:14

Notice how we didn't have to worry about sending the data or the model over to the GPU.

Interpretation and Analysis

And the cherry on top is the ability to do interpretation and analyze errors with a very neatly written function call.

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix(figsize=(12,12), dpi=60)

Summary

And we're done! Sure the model can be improved upon from here, but the point is that I can now focus on that bit precisely after just getting started and not worry about anything else. I'd advice now to please read the chapter 5 of the fastbook as the last few lines have missed a few points about Data Augmentation, finding the right Learning Rate etc.

References