One common challenge when working with containerized Python environments, such as the OpenShift AI Workbench, is ensuring that your installed packages persist across sessions. Normally, when a container restarts, all the installed packages that were not part of the original image are lost. This can be frustrating, especially in development workflows that rely on a specific set of packages.
Fortunately, there is a solution that allows for persistence of Python packages by leveraging the PIP_TARGET
and PYTHONPATH
environment variables.
The Challenge
In OpenShift AI Workbench, every time you launch a workbench, it creates a fresh environment. Any libraries or packages installed via pip are lost when the workbench is stopped or restarted, leading to a waste of time and computational resources as you re-install packages every time.
The Solution
By setting the PIP_TARGET
and PYTHONPATH
environment variables, we can direct pip to install packages into a specified directory that persists across restarts and configure Python to recognize this directory when importing packages.
Set PIP_TARGET
The PIP_TARGET
variable tells pip where to install the packages. Rather than installing to the global site-packages directory, pip will install to the specified directory, which, in the case of OpenShift AI Workbench, should be within the persistent folder. For this environment, we're utilizing /opt/app-root/src
, a directory that remains intact across workbench restarts. Set this to /opt/app-root/src/.pip
or another subdirectory within /opt/app-root/src
.
Set PYTHONPATH
Once PIP_TARGET
directs pip to install packages into our persistent directory, PYTHONPATH
comes into play. Setting PYTHONPATH
to the same directory ensures that Python can find and import these packages at runtime, even after a restart or redeployment. Set this to the same value as PIP_TARGET
to ensure Python looks here for your custom-installed packages.
Mount an Extra Volume (Optional)
If you require additional storage or want to separate your package storage from other workbench data, you can mount an extra volume in OpenShift and point PIP_TARGET
to a directory on that volume. This provides you with more control over your environment and can be particularly useful when working with large packages or datasets.
Now restart the workbench which should fix this Python Packages persistent issue.
Verification
In this example, we're installing Faker
, a handy library for generating fake data:
pip install faker
Check the target directory to ensure that the Faker library files are present:
ls /opt/app-root/src/.pip/faker/
ls /opt/app-root/src/.pip/Faker-24.11.0.dist-info/
Create a simple Python script to test the importing of both a standard library (json
) and your custom-installed package (Faker
):
import json
from faker import Faker
def test_standard_and_custom_packages():
fake = Faker()
# Generate fake data using Faker
user_data = {
"name": fake.name(),
"email": fake.email(),
"address": fake.address()
}
# Convert the dictionary to a JSON string
user_data_json = json.dumps(user_data, indent=4)
# Print the JSON data
print("Generated Fake User Data in JSON Format:")
print(user_data_json)
test_standard_and_custom_packages()
Run the script. If the environment is set up correctly, you should be able to import the Faker
library from the custom path and the script should output something similar to the following:
Generated Fake User Data in JSON Format:
{
"name": "Tony Mejia",
"email": "[email protected]",
"address": "2156 Torres Keys\nCarolynburgh, ND 38868"
}
Top comments (0)