Managing Python environments¶
There are some important things Apocrita Python users need to know, such as how to manage and curate your personal environments, as well as how to tackle some common problems during the process.
Common issues¶
Anaconda and Miniconda are not available¶
Due to
licensing issues,
Anaconda and Miniconda are not available on Apocrita. Instead, we only offer a
miniforge module. See our
documentation for more
detailed information.
Don't use the defaults channel, use nodefaults ONLY¶
Some Conda packages you install may propose you use the defaults channel. This
is actively discouraged in the
official Mamba documentation
(see here for more information about
Mamba and why we recommend all users use it). Instead, you should only use
nodefaults, which will disable the defaults channel and use only the
conda-forge channel.
Any ~/.condarc file used should something look like this:
channels:
- nodefaults
ssl_verify: true
## Optional - store Conda environments in an alternative location
envs_dirs:
- /data/scratch/abc123/anaconda/envs
pkgs_dirs:
- /data/scratch/abc123/anaconda/pkgs
Please do not define any additional channels, or extra configuration like:
channel_priority: flexible
auto_activate_base: false
This is highly likely to lead to package installation failures. If your package
requires additional channels to install, please define them using the -c flag
during installation in the order you want to use them, e.g.:
mamba install -c bioconda -c conda-forge cellrank
Remove packages in ${HOME}/.local/lib¶
The ITSR team get a lot of tickets from users encountering issues with both Python and Conda with errors such as:
File "/data/home/abc123/.local/lib/python3.10/site-packages/tensorflow/python/training/saver.py", line 38, in <module>
from tensorflow.python.framework import meta_graph
File "/data/home/abc123/.local/lib/python3.10/site-packages/tensorflow/python/framework/meta_graph.py", line 18, in <module>
from packaging import version as packaging_version # pylint: disable=g-bad-import-order
ModuleNotFoundError: No module named 'packaging'
You'll notice there that the path commences with a user's home directory
(${HOME}). These packages are "orphaned" and have been installed incorrectly
due to there not being a correctly activated
Python virtualenv
or
Conda environment
at the time the packages were installed. pip install commands will fallback to
the --user location of ${HOME}/.local/lib in such cases, which in the
context of Apocrita causes a huge number of issues.
If any "orphaned" packages exist in your own home directory, clear these out using the following command:
rm -rf ${HOME}/.local/lib/python*
This will remove any directories for any Python version under
${HOME}/.local/lib, and this directory should always remain free of any
directories named python* (like python3.10, python3.12 etc.) Before
raising a ticket in future, make sure this directory doesn't have any of those
first.
Be sure your Conda environment is fully activated
Python packages will often end up in ${HOME}/.local/lib if you create and
activate a Conda environment and then immediately run a pip install
command, e.g.:
module load miniforge
mamba create -n myenv
mamba activate myenv
pip install <package>
This is because Conda doesn't know the location of the Python binary for the environment correctly. To avoid this, either specify a version of Python during environment creation:
module load miniforge
mamba create -n myenv python=3.12
mamba activate myenv
pip install <package>
Or, make sure you run a mamba install command before any subsequent
pip install commands:
module load miniforge
mamba create -n myenv
mamba activate myenv
mamba install pip
pip install <package>
It's best to avoid mixing mamba install and pip install wherever
possible, and try to use mamba install exclusively in Conda environments
and pip install exclusively in virtualenvs. However, we understand that
sometimes not everything you need is packaged in Conda and you have to
install certain packages from PyPi. In such cases, tread carefully.
Clear caches¶
Should you wish to start over with a "clean slate", you can clear your cached Python/Conda packages.
Clear Python cache¶
There is no pip binary without an active virtualenv
When you load a Python module on Apocrita, the pip binary will not be
available unless you are in an activated virtual environment. This is to try
to avoid
"orphaned" packages ending up in your home directory.
To clean your PyPi cache entirely, use an interactive salloc session to
activate any current Python virtualenv and make pip available:
$ salloc
$ module load python
$ source /path/to/virtualenv
$ pip cache purge
Files removed: 435
Note, this may take some time depending on the number of cached files you have.
Clear Conda cache¶
Conda uses a pkgs directory to cache any packages you install to speed up
future installations of then same package. By default, this package cache is
stored in:
${HOME}/.conda/pkgs/cache
Use the correct location
If you have a
modified ${HOME}/.condarc file
then this path may be elsewhere, such as your scratch directory. If so,
adjust the instructions below as necessary.
To remove your pkgs cache, use an interactive salloc session:
$ salloc
$ module load miniforge
$ rm -rf ${HOME}/.conda/pkgs
$ mamba clean -a
There are no unused tarball(s) to remove.
There are no index cache(s) to remove.
There are no unused package(s) to remove.
There are no tempfile(s) to remove.
There are no logfile(s) to remove.
Note, this may take some time depending on the number of cached files you have.
Backing up environments¶
Sometimes you may wish to backup and/or move existing Python environments. You should only do this if you require as close a replica of an environment as possible, i.e. with pinned package versions etc. Most users should install fresh, updated and un-pinned versions of any packages required according to their official installation instructions.
Back up Python virtualenvs¶
You can use the pip freeze command to take a snapshot of any Python virtualenv
and redirect this to a requirements.txt file. To do this, use an
interactive salloc session:
salloc
source /path/to/virtualenv/bin/activate
pip freeze > requirements.txt
If you have multiple Python environments to migrate, then use a memorable name
for each one (myenv1.txt, myenv2.txt etc.) and repeat the process above
for each one. Once you have backed them all up you can
re-create them later on.
Back up Conda environments¶
Use Mamba
Remember to use mamba in place of all conda commands as it is markedly
faster. See this blog post for more
information.
You can
export Conda environments to a YML file
that can then be used to re-create them later. To do this, again, use an
interactive salloc session:
salloc
module load miniforge
mamba activate myenv
mamba env export > environment.yml
Replace myenv with the name of your Conda environment, and if you are
migrating multiple environments, then use a memorable name for each one
(myenv1.yml, myenv2.yml etc.) and repeat the process above for each one.
Once you have backed them all up you can
re-create them later on.
Re-creating environments¶
Not all environments can be easily re-created
Whilst some environments will be easily re-created, not all of them will. If you experience issues re-creating environments as detailed below, please raise a ticket and we will offer more detailed support.
Re-creating Python virtualenvs¶
Presuming you have correctly backed up your
virtualenvs, then you can restore them in an interactive salloc session:
salloc
module load python
virtualenv myenv
source myenv/bin/activate
pip install -r requirements.txt
You may want to load a specific Python version module rather than the default,
in which case use module load python/<version>
(e.g. module load python/3.12) instead of just module load python.
Repeat the process if you have multiple virtualenvs frozen to
requirements.txt files.
Re-creating Conda environments¶
Creating a Conda environment requires more resources
A default salloc request (i.e. with no arguments) is 1 hour, 1 core and
1GB RAM per core. This often isn't enough for creating a Conda environment.
Most environment creation is single-core, so 1 core for 24 hours (1 hour is often not enough for more complex environments) with 8GB RAM is recommended:
salloc --mem-per-cpu=8G --time=24:0:0
Check for the defaults channel
Please review any environment.yml files and check if the defaults
channel is defined in the channels: section; if it is, replace it
with nodefaults before proceeding with the instructions below (as per the
information above).
Presuming you have correctly backed up your
Conda environments, then you can
re-create
them in an interactive salloc session:
salloc --mem-per-cpu=8G --time=24:0:0
module load miniforge
mamba env create -f environment.yml
Repeat the process if you have multiple virtualenvs exported to
environment.yml files.
Frequently Asked Questions¶
Does Apocrita offer any modules for uv?¶
uv is a very popular tool for managing Python environment, but we don't offer
a module as it has not (at the time of writing) reached a v1 release. Usage of
uv on Apocrita is covered in a separate blog post.
I've loaded a Python module but get a "-bash: pip: command not found" error¶
On Apocrita, the pip binary will not be available until you are in a correctly
activated
Python virtualenv.
This is to try to avoid
"orphaned" packages ending up in your home directory.
My job failed with "ERROR: Unable to locate a modulefile for 'python/3.10'"¶
There is no module for Python 3.10 on Apocrita. You'll either need to use an
available version (running module avail python will list all available
versions), or if you really need Python 3.10, you'll need to
create a Conda environment
specifying python=3.10 at the time of creation.
My Conda installs keep failing with "error libmamba Could not open lockfile"¶
The following error can be safely ignored:
error libmamba Could not open lockfile
'/share/apps/rocky9/general/apps/miniforge/24.7.1/pkgs/cache/cache.lock'
conda-forge/linux-64 Using cache
error libmamba Could not open lockfile
'/share/apps/rocky9/general/apps/miniforge/24.7.1/pkgs/cache/cache.lock'
conda-forge/noarch Using cache
This is purely a warning and is due to an unfixed bug (see here and here) with Mamba when the base environment is read-only (which it is on Apocrita as it is shared between all users).
Even without any environment exports or ~/.condarc file in place,
the default location of ~/.conda would be used for all Conda packages and
environments, despite the lockfile warning.
I'm seeing Conda errors about cache files being modified by another program¶
If you see errors such as:
warning libmamba Cache file "/data/home/abc123/.conda/pkgs/cache/d4808d92.json"
was modified by another program
nodefaults/linux-64 (check zst) Checked 0.4s
or:
Preparing transaction: done
Verifying transaction: \
SafetyError: The package for r-base located at
/data/home/abc123/.conda/pkgs/r-base-4.4.2-hc737e89_2 appears to be corrupted.
The path 'lib/R/doc/html/packages.html'
has an incorrect size.
reported size: 3423 bytes
actual size: 61857 bytes
ClobberError: The package
'conda-forge/noarch::sysroot_linux-64-2.17-h0157908_18' cannot be installed due
to a path collision for 'x86_64-conda-linux-gnu/sysroot/lib'.
This path already exists in the target prefix, and it won't be removed
by an uninstall action in this transaction. The path is one that conda
doesn't recognize. It may have been created by another package manager.
You need to make sure you have cleared you Conda cache as detailed above.
My Conda installs keep failing with "perhaps a missing channel"¶
Please check your ~/.condarc file is correct as
detailed above.
