This hands-on guide will teach you to use the Kaggle Platform to start your own machine learning experiments. Kaggle.com is an online community of data scientists and machine learning practitioners and You can learn lots from other publically shared notebooks. We, at aitude.com, highly recommend this to our interns.
Create a Notebook
Go to Kaggle.com and log in with your credentials. Click on Notebooks at the top menu and then click on New Notebook.
And you’ll see a new notebook setting page at the first.
- Select Language – You can code using Python or R language.
- Select Type – Notebook allows you to put comments with formatting on your code and show graphs for data visualization. And Scripts are plain text files. So Notebook is highly recommended for learning purposes.
- Select GPU Preference – For heavy computation, e.g Deep Learning experiments or Using Images dataset for manipulations, we use GPU instead of CPU. Otherwise, keep it turn off because Kaggle provides limited GPU hours per week.
- Enable Google Cloud Services – This is a very powerful feature to use google cloud services directly into your notebook. e.g getting data from BigQuery. But Google Cloud Platform is not free and may charge for using resources in the experiments so you can keep it off.
Click on Create button at the bottom. And you’ll be redirected to the notebook editor.
It’ll take a few minutes to prepare the kernel and you’ll see an editor area, with sample code, on the left side and configuration on the right side. Give a name e.g Simple Linear Regression to the kernel at the top left.
Next, we’ll attach a dataset to be used in our project. So click on Add Data link at the top right.
You see a dataset list with dataset statistics and filter, sorting and search option. Search the dataset according to your requirement and then click on the Add button to attach the dataset with the project. You can see all the attached datasets on the right side Data section.
You can upload your own dataset as well and make that public to be used by the community.
Add New Package
A larget set of packages are available in the development environment but you can install new packages as well using the following steps:
In your kernel :
- Go to the Setting section on the right side.
- Turn on the Internet because packages will be downloaded from repositories. You can turn off it once the library is installed.
Use !pip install command to install the new package in the current docker image.
!pip install ultimate
Note that this is a one-time task so once a package is installed, you can remove this code.
There are two types of cells in a notebook. First is Code cell and the second is Markdown Cell. You need to create a Code cell to write the code.
You can divide your program into small code cells and execute each cell one by one for easy debug process. Click on the forward icon on the left side to execute the code and output will be shown at the below of the code cell. You can remove the code cell by clicking the Delete icon on the right side.
Notebooks are very useful to explain a code for future modification or distribute the code to the Kaggle community. Markdown Cell is used for this purpose. They have own basic language for comment formatting. You can read complete Markdown Cells documentation.
e.g For headings, You can use #h1, ##h2, ###h3 etc.
Once your notebook is ready and runs successfully. You should commit the notebook so you can start your next experiment by utilizing the existing code. Each commit will create a new version and will be listed in the Versions section on the right side.
I’d recommend that you share your experiments publicly to get reviews from researchers and experienced programmers. There are two ways to make your kernel public.
Go to the Settings section on the right side of the kernel editor and set the Public in the Sharing option.
The second is to go to the kernel viewer by clicking on committed versions in the Versions section on the right side.
Click on the Sharing link in the top right corner of the kernel viewer. Next, a dialog box will pop up and set Public to Privacy option and click on the Save button.
At aitude.com, we love to share our experiments. Below are some useful experiments:
- KFold Implementation for Energy Prediction
- Handle Missing Weather Information
- Hyper-parameter Tunning in LightGBM
Best Practices for Beginners
Are you new to the Kaggle Platform? We’d like to share how-to-start tips based on our experience on this platform.
- if you have been not gone through Pandas, Numpy & Matplotlib concepts in-depth then you must practice them first. It’ll make super easy to understand other Kaggler’s notebooks and grab ideas and implement your own ideas.
- Go to Public Notebooks and go through Starter Kit or EDA notebooks and try to understand these notebooks and implement them yourself.
- Get Ideas and build your own notebooks and share it with the community.
- Don’t hesitate to ask questions on Kaggle’s discussion forums. Kaggle is a wonderful community for machine learning enthusiasts.