ScikitLearn.jl

ScikitLearn.jl lets you use many stats packages and machine learning models from Python's scikit-learn library — but directly in Julia! It helps you do things like predictions, classifications, and more using very beginner-friendly tools.

With ScikitLearn.jl, you can:

  • Train and evaluate machine learning models

  • Use toy datasets to explore machine learning models

Installation & Setup

First, make sure you have Julia installed. On Oscar you can just enter the command module load julia in terminal. If not, refer to this page to install the appropriate version of Julia for you computer.

Once Julia is installed, enter the Julia interactive window by entering the command julia.

Once in the interactive window enter the following command to download the appropriate packages:

julia
using Conda
Conda.add("scikit-learn")

This command installs Python's ScikitLearn package to your conda environment. Now, open Julia and run one at a time (these might take a while so be patient):

julia
using Pkg
Pkg.add("ScikitLearn")
Pkg.add("DecisionTree") # Add external decision tree model

If you are using ScikitLearn for the first time you might need to install it. Julia should automatically give you some installation prompts.

Example 1: Logistic Regression

ScikitLearn has several 'toy' datasets that can be used for experimentation and development (see here). We’ll use a pretty well know dataset of iris flowers to train a model to predict a flower's type given some quantitative descriptive data. We will start with a basic logistic regression model (more info here).

julia
using ScikitLearn # Load ScikitLearn
using ScikitLearn: fit!, predict, score # Load several methods that will be relevant
@sk_import linear_model: LogisticRegression # Logistic regression model
@sk_import datasets: load_iris # Load ScikitLearn's Iris dataset

# Load the iris flower dataset. This resembles a Julia DataFrame or a Python Pandas DataFrame
data = load_iris()
X = data["data"]    # features (petal length, width, etc.)
y = data["target"]  # labels (0, 1, or 2)

# We'll just try to predict between class 0 and class 1 (ignore class 2)
X_small = X[y .!= 2, :]
y_small = y[y .!= 2]

# Create the logistic regression model
model = LogisticRegression()

# Call fit! with your model and data to train the model
fit!(model, X_small, y_small)

# Make predictions
predictions = predict(model, X_small)

# Check accuracy
accuracy = score(model, X_small, y_small)

println("Logistic Regression Accuracy: ", accuracy)

Example 2: Decision Tree

Now let’s try using a decision tree to classify the same flowers.

julia
using ScikitLearn # Load ScikitLearn
using ScikitLearn: fit!, predict, score # Load several methods that will be relevant
@sk_import datasets: load_iris # Load ScikitLearn's Iris dataset
@sk_import tree: DecisionTreeClassifier # Load ScikitLearn's DecisionTreeClassifier

# We will use the full dataset this time
X = data["data"]
y = data["target"]

# Create a decision tree model
tree_model = DecisionTreeClassifier(max_depth=3)

# Train the decision tree
fit!(tree_model, X, y)

# Make predictions
tree_predictions = predict(tree_model, X)

# Check accuracy
tree_accuracy = score(tree_model, X, y)

println("Decision Tree Accuracy: ", tree_accuracy)

Note that the 'simpler' logistic regression model actually may outperform the more complex decision tree. In this case that is due to the simplicity of the Iris dataset.

Key Terms to Know

Term
What It Means

fit!

Teach the model using your data

predict

Ask the model to guess based on new data

score

See how good the model is (1.0 = perfect, 0.0 = bad)

X

The input data (features)

y

The correct answers (labels)

Resources

Last updated