Passer au contenu principal

Course software for decision trees / random forests

In the practicals, we will use only a small dataset and we will need only little computation power and memory ressources. You can therefore do the practicals on various computing platforms:platforms. However, since the participants may use various types of computers and softwares, we recommend to use the UNIL JupyterLab to do the practicals. 

  • LaptopJupyterLab: Working on the cloud is convenient because the installation of the Python and R packages is already done and you will be working with a Jupyter Notebook style even if you use R.

  • Laptop: This is good if you want to work directly on your laptop, but you will need to install the required libraries on your laptop.

  • JupyterLab:Warning: WorkingWe will give general instructions on how to install the cloudlibraries on your laptop but it is convenientsometimes becausetricky to find the installationright involveslibrary only copyversions and pastewe ofwill anot listbe ofable commands.to Thenhelp you will be working with athe Jupyterinstallation. NotebookThe styleinstallation evenshould iftake youabout use15 R.

    minutes.                                                                                                                               
  • Curnagl: This is efficient if you are used to work on a cluster or if you intend to use one in the future to work on large projects. If you do not have an account to access the UNIL cluster Curnagl, please contact us at least a week before the course so that we can give you a temporary account.  The installation should take about 15 minutes. Note that it is also possible to use JupyterLab on Curnagl: see https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/jupyterlab-on-the-curnagl-cluster

The installation should take about 15 minutes. If you have difficulties during the installation, we can help you so please contact us before the course.

Laptop

You may need to install development tools including a C and Fortran compiler (e.g. Xcode on Mac, gcc and gfortran on Linux, Visual Studio on Windows).

Python installation

Here are some instructions for installing decision tree and random forest libraries on your laptop. You need Python >= 3.7.

For Mac and Linux

We will use a terminal to install the libraries.

Let us create a virtual environment. Open  your terminal and type:

python3 -m venv mlcourse

source mlcourse/bin/activate

pip3 install scikit-learn pandas matplotlib graphviz seaborn

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

source mlcourse/bin/activate

pip3 install notebook

jupyter notebook

For Windows

If you do not have Python installed, you can use either Conda: https://docs.conda.io/en/latest/miniconda.html or Python official installer: https://www.python.org/downloads/windows/ 

Let us create a virtual environment. Open  your terminal and type:

C:\Users\user>python -m venv mlcourse

C:\Users\user>mlcourse\Scripts\activate.bat

(mlcourse) C:\Users\user>

(mlcourse) C:\Users\user>pip3 install scikit-learn pandas matplotlib graphviz seaborn

You can terminate the current session:

(mlcourse) C:\Users\user>deactivate

C:\Users\user>

TO DO THE PRACTICALS (today or another day):

You can use any Python IDE (e.g. Jupyter Notebook or PyCharm), but you need to launch it after activating the virtual environment. For example, for Jupyter Notebook:

C:\Users\user>mlcourse\Scripts\activate.bat

(mlcourse) C:\Users\user>pip3 install notebook

(mlcourse) C:\Users\user>jupyter notebook

Information: Use Control-C to stop this server.

R installation

Here are some instructions for installing decision tree and random forest libraries on your laptop.

You need R >= 4.0. Run R in your terminal or launch RStudio.

For Windows users, you can download R here: https://cran.r-project.org/bin/windows/base/

REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

install.packages("rpart")

install.packages("rpart.plot")

install.packages("randomForest")

install.packages("tidyverse")

The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine. 

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

Simply run R in your terminal or launch RStudio.

JupyterLab

Here are some instructions for installing decision tree and random forest libraries on the JupyterLab of the EPFL.

Go to the webpage: https://noto.epfl.ch/

Use your Switch AAI login: University of Lausanne

Enter the login and password associated to your Switch edu-ID (and NOT your UNIL credentials).

Python installation

Click on the Terminal square button in the Other panel (at the bottom of the page).

Let us create a virtual environment. Type (or copy/paste):

my_venvs_create mlcourse

Press return.

Then type and run the following commands:

my_venvs_activate mlcourse

pip install scikit-learn pandas matplotlib graphviz seaborn

my_kernels_create Decision_Trees "Decision Trees"

my_venvs_deactivate

The installation is complete !

Select File / Log out to close the Jupyter session.

TO DO THE PRACTICALS (today or another day):

Log into the JupyterLab, double click on the "my_notebooks" folder (left pannel) and then click on the "Decision Trees" square button in the Notebook panel.

To execute a command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

When you have finished the practicals, select File / Log out.

R installation

Click on the R square button in the Notebook panel, and type (or copy/paste) in the first cell:

install.packages("rpart")

install.packages("rpart.plot")

install.packages("randomForest")

install.packages("tidyverse")

To execute this command, click on "Run the selected cells and advance" (the right arrow), or SHIFT + RETURN.

The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine. 

The installation is complete !

Select File / Log out to close the Jupyter session.

TO DO THE PRACTICALS (today or another day):

Log into the JupyterLab, double click on the "my_notebooks" folder (left pannel) and then click on the R square button in the Notebook panel.

When you have finished the practicals, select File / Log out.

Curnagl

For the practicals, it will be convenient to be able to copy/paste text from a web page to the terminal on Curnagl. So please make sure you can do it before the course. You also need to make sure that your terminal has a X server.

For Mac users, download and install XQuartz (X server): https://www.xquartz.org/

For Windows users, download and install MobaXterm terminal (which includes a X server). Click on the "Installer edition" button on the following webpage: https://mobaxterm.mobatek.net/download-home-edition.html

For Linux users, you do not need to install anything.

Python installation

Here are some instructions for installing decision tree and random forest libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question "do you want to save password ?" Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/

mkdir < my unil username >

cd < my unil username >

For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13

python -m venv mlcourse

source mlcourse/bin/activate

pip install scikit-learn pandas matplotlib graphviz seaborn

You can terminate the current session:

deactivate

exit

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13

source mlcourse/bin/activate

python

R installation

Here are some instructions for installing decision tree and random forest libraries on the UNIL cluster called Curnagl. Open a terminal on your laptop and type (if you are located outside the UNIL you will need to activate the UNIL VPN):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

Here and in what follows we added the brackets < > to emphasize the username, but you should not write them in the command. Enter your UNIL password.

For Windows users with the MobaXterm terminal: Launch MobaXterm, click on Start local terminal and type the command ssh -Y < my unil username >@curnagl.dcsr.unil.ch. Enter your UNIL password. Then you should be on Curnagl. Alternatively, launch MobaXterm, click on the session icon and then click on the SSH icon. Fill in: remote host = curnagl.dcsr.unil.ch, specify username = < my unil username >. Finally, click ok, enter your password. If you have the question “do you want to save password ?” Say No if your are not sure. Then you should be on Curnagl.

See also the documentation: https://wiki.unil.ch/ci/books/high-performance-computing-hpc/page/ssh-connection-to-dcsr-cluster

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/

mkdir < my unil username >

cd < my unil username >

For convenience, you will install the libraries from the frontal node to do the practicals. Note however that it is normally recommended to install libraries from the interactive partition by using (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13 r/4.2.1

R

REMARK: The R libraries will be installed in your home directory. To allow it, you must answer yes to the questions:

Would you like to use a personal library instead? (yes/No/cancel) yes

Would you like to create a personal library to install packages into? (yes/No/cancel) yes

And select Switzerland for the CRAN mirror.

install.packages("rpart")

install.packages("rpart.plot")

install.packages("randomForest")

install.packages("tidyverse")

The installation of "tidyverse" may lead to some conflicts, but do not worry you should be able to do the practicals fine. 

You can terminate the current R session:

q()

Save workspace image? [y/n/c]: n

TO DO THE PRACTICALS (today or another day):

ssh -Y < my unil username >@curnagl.dcsr.unil.ch

cd /work/TRAINING/UNIL/CTR/rfabbret/cours_hpc/< my unil username >

For convenience, you will work directly on the frontal node to do the practicals. Note however that it is normally not allowed to work directly on the frontal node, and you should use (Sinteractive -m 4G -c 1).

module load gcc python/3.9.13 r/4.2.1

R