Error: cannot find -lmkl_intel_lp64 while compiling on Ubuntu 20.04

Ubuntu 20 is escalating positions as the heir of CentOS 7, and I started experiencing compatibility issues. I was compiling relion with the next options

cmake -DAMDFFTW=ON -DCUDA_ARCH=86 -DCUDA=ON -DCudaTexture=ON 
-DFORCE_OWN_FLTK=ON -DCMAKE_CXX_COMPILER=g++ -DCMAKE_C_COMPILER=gcc
-DMPI_C_COMPILER=/usr/bin/mpicc
-DMPI_C_LIBRARIES=/usr/lib/x86_64-linux-gnu/openmpi/lib/
-DMPI_C_INCLUDE_PATH=/usr/lib/x86_64-linux-gnu/openmpi/include/
-DCUDA_ARCH=61 -DCUDA=ON -DCudaTexture=ON -DMKLFFT=ON
-DFORCE_OWN_FLTK=ON -DGUI=ON
-DCMAKE_INSTALL_PREFIX=/XXX/relion_local/
-D CMAKE_BUILD_TYPE=Release ..

when I found this error

[ 59%] Linking CXX executable ../../bin/relion_tomo_tomo_ctf
/usr/bin/ld: cannot find -lmkl_intel_lp64
/usr/bin/ld: cannot find -lmkl_sequential
/usr/bin/ld: cannot find -lmkl_core
collect2: error: ld returned 1 exit status
make[2]: *** [src/apps/CMakeFiles/tomo_ctf.dir/build.make:102:
bin/relion_tomo_tomo_ctf] Error 1
make[1]: *** [CMakeFiles/Makefile2:352:
src/apps/CMakeFiles/tomo_ctf.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

I’m using the default package install locations, no modules or anything fancy. So it looks like some libraries are missing. Of course I try to find them with ldconfig -p | grep ‘mkl’ but they are obviously not there. Let’s install them:

# apt-get install libmkl*intel* libmkl*se* libmkl*core*
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libmkl-intel-thread' for glob 'libmkl*intel*'
Note, selecting 'libmkl-blacs-intelmpi-ilp64' for glob 'libmkl*intel*'
Note, selecting 'libmkl-intel-ilp64' for glob 'libmkl*intel*'
Note, selecting 'libmkl-blacs-intelmpi-lp64'
for glob 'libmkl*intel*'
Note, selecting 'libmkl-intel-lp64' for glob 'libmkl*intel*'
Note, selecting 'libmkl-sequential' for glob 'libmkl*se*'
Note, selecting 'libmkl-core' for glob 'libmkl*core*'
Note, selecting 'libmkl-cdft-core' for glob 'libmkl*core*'
The following additional packages will be installed:
libmkl-def libmkl-locale libmkl-vml-def
The following NEW packages will be installed:
libmkl-blacs-intelmpi-ilp64 libmkl-blacs-intelmpi-lp64
libmkl-cdft-core libmkl-core libmkl-def
libmkl-intel-ilp64 libmkl-intel-lp64
libmkl-intel-thread libmkl-locale
libmkl-sequential libmkl-vml-def

0 upgraded, 11 newly installed, 0 to remove and 19 not upgraded.
Need to get 38,4 MB of archives.
After this operation, 204 MB of additional disk space will be used.
Do you want to continue? [Y/n] y

You may need to run apt-get install mklibs in addition. Of course it depends on what you have been doing with your system so far, since this will call some python libraries also. After getting those packages, make and make install run without errors. Time to ask the user to test it…

Advertisement

Python Import Error: can’t import name gcd from fractions

Deprecation is a big issue in python. I’m in need of Molecular Dynamic (MD) simulations tools. The error above comes from one tool I already posted about, called LipIDens, more specifically, it’s a complain thrown away by vermouth. Vermouth (for VERsatile, MOdular, and Universal Tranformation Helper) isΒ also a drink and the python library that powers Martinize2. The vermouth source comes here. It is supposed to be used to apply transformation on molecular structures. Which means I don’t really know what it does! Anyway, my error reads

Installed 
/usr/local/lib/python3.9/site-packages/lipidens-1.0.0-py3.9.egg
Processing dependencies for lipidens==1.0.0
error: networkx 3.0 is installed but
networkx~=2.0 is required by {'vermouth'}

What to do here? I found the solution and the explanation once more on StackOverflow. It’s very interesting to know that the the grammar for a mathematical library changed after Python 3.5. So then, why on the LipIDens documentation it is recommended to use a python above 3.9? I’m going to leave the answer to this question open (old developer environments with remnants or insufficient tests) and show you my solution. We install a specific python package. I choose pip to install it instead of conda because it goes to my python site-packages, which I personally consider a more elegant solution. Here you have my output:

bash-5.1# pip install networkx==2.5
Collecting networkx==2.5
Downloading networkx-2.5-py3-none-any.whl (1.6 MB)
|XXXXXXX| 1.6 MB 4.3 MB/s
Requirement already satisfied: decorator>=4.3.0 in /usr/local/lib/python3.9/site-packages (from networkx==2.5) (5.1.1)
Installing collected packages: networkx
Attempting uninstall: networkx
Found existing installation: networkx 2.0
Uninstalling networkx-2.0:
Successfully uninstalled networkx-2.0
Successfully installed networkx-2.5
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
bash-5.1# python setup.py install

After the pip install over my local python I run again the LipIDens installer and it works. Another issue is to get meaningful results from the program! BTW, I decided to keep writing my bits thanks to some good feedback that I was missing before… thank you guys. I appreciate it.

ERROR : /usr/share/Modules/init/sh: No such file or directory on Ubuntu 22.04

We started the migration to a new Linux flavour – in principle Ubuntu– and therefore we started also to experience the first issues. We have a very established software module method, with the module definitions being loaded from a network location. We can install software modules, but after every folder needed and every other adjustment is made we see this:

$ > ssh user@new-server
Last login: DATE from other-server.org
-bash: /usr/share/Modules/init/sh: No such file or directory

I have a machine “other-server” to check how the modules are working. And I find that the folder “Modules” is missing in Ubuntu. The content of the folder modules in the same location seems to be the same, so I do

root@new-server:/usr/share# ln -s /usr/share/modules Modules

And try logging again. With this result:

$ > ssh user@new-server
user@new-server's password:
Last login: DATE from other-server.org
user@new-server ~ $ > module avail
/PATH/modulecmd:
error while loading shared libraries:
libtcl8.5.so: cannot open shared object file:
No such file or directory

So there’s now a library problem. Let’s try to fix it very quickly. I localize a similar library

root@new-server:~# ls /usr/lib/*/*libtcl*
/usr/lib/x86_64-linux-gnu/libtcl8.6.so
/usr/lib/x86_64-linux-gnu/libtcl8.6.so.0
/usr/lib/x86_64-linux-gnu/libtclenvmodules.so
root@new-server:~#
cp /usr/lib/x86_64-linux-gnu/libtcl8.6.so /usr/lib/libtcl8.5.so
root@new-server:~# ldconfig

And the error is gone. We can do this because the libraries are not so different…. I have no idea of the side effects. Let’s hope there are not so many πŸ˜‰!

Virtual machine manager error: no connection driver available for qemu:///system on CentOS 7

I’m trying to find a nice full-sim environment on my dying CentOS 7.X system. It means not a docker neither a LXC solution but a full OS with IP and so on. As similar as the real thing but running on CentOS 7. I have a clean machine, and I remember qemu as the tool that does everything I want. So I install it, start the service, and call the GUI. Like this:

yum install qemu-kvm qemu-img virt-manager libvirt-daemon
systemctl start libvirtd
virt-manager &

The GUI pops up but it gives me the error above. That I solve in one of the ways described in this post. I have updated and enable the service without luck. So I go for missing packages.

yum -y install qemu-kvm qemu-img virt-manager \
libvirt libvirt-python python-virtinst \
libvirt-client virt-install virt-viewer

After that, no need to reboot, I get my GUI and I can start playing with VMs. I will report you my findings, if any 🧐. BONUS: another post about a similar issue.

HOWTO: run a GUI in a docker

I’m trying to have this ChimeraX docker running on my CentOS 7.9 and the latest ChimeraX. It turned out I can’t, since I don’t have Qt6 and the support for the above CentOS choice has been dumped, but it has been an interesting experiment, enough to log it. The ChimeraX docker image from the docker builds when you bring it, so in principle it looks like it should work. The documentation, unfortunately, doesn’t tell how to start a sample docker. I will tell you:

docker run -i -t --name chimeraXtest \
--net=host --privileged -e DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix:ro \
chimerax:latest /bin/bash

If you have been paying attention to my docker notes, this will deliver you to a bash shell inside the ChimeraX docker, that seems to be Ubuntu 20 based. One can install and run GUIs inside the bash shell (for example try apt-get install nedit), but the very thing we want to run crashes. Like this:

ImportError: libQt6Core.so.6: cannot open shared object file: No such file or
directory

BUG: ImportError: libQt6Core.so.6: cannot open shared object file: No such file or
directory

File "/usr/lib/ucsf-chimerax/lib/python3.9/site-packages/Qt/__init__.py", line
64, in
from PyQt6.QtCore import PYQT_VERSION_STR as PYQT6_VERSION

_See log for complete Python traceback._

There’s no obvious solution for this import error. Maybe I will investigate how to run on a Qt6 docker container for CI. Or do you have a better suggestion maybe? Check this post: docker x11 fails to open display. Tomorrow more dockers, maybe. If I have time πŸ˜‰.

ERROR: failure: repodata/repomd.xml from kubernetes: [Errno 256] No more mirrors to try (fix on CentOS 7.X)

I’m trying to get kubernetes integrated in my SLURM cluster, so I started again to deploy a kubernetes cluster. Unfortunately the previous step-by-step install kubernetes fails now on the Step 2. This is my output, edited as usual to obscure irrelevant information:

## > yum install -y kubelet kubeadm kubectl
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
* base: ftp.halifax.rwth-aachen.de
* centosplus: mirror.checkdomain.de
* epel: ftp.halifax.rwth-aachen.de
* epel-testing: ftp.halifax.rwth-aachen.de
* extras: ftp.rz.uni-frankfurt.de
* rpmfusion-free-updates: mirror.netsite.dk
* updates: ftp.rrzn.uni-hannover.de
kubernetes/signature | 844 B 00:00:00
Retrieving key from
https://packages.cloud.google.com/yum/doc/yum-key.gpg
Importing GPG key 0x13EDEF05:
Userid : "Rapture Automatic Signing Key
(cloud-rapture-signing-key-2022-03-07-08_01_01.pub)"
Fingerprint: a362 b822 f6de dc65 2817 ea46 b53d c80d 13ed ef05
From : https://packages.cloud.google.com/yum/doc/yum-key.gpg
Retrieving key from
https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
kubernetes/signature | 1.4 kB 00:00:00 !!!
https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml:
[Errno -1] repomd.xml signature could not be verified for kubernetes
Trying other mirror.

One of the configured repositories failed (Kubernetes),
and yum doesn't have enough cached data to continue.
At this point the only safe thing yum can do is fail.
There are a few ways to work "fix" this:

1. Contact the upstream for the repository and
get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository,
to point to a working upstream.
This is most often useful if you are using a newer
distribution release than is supported by the repository
(and the packages for the previous distribution
release still work).
3. Run the command with the repository temporarily disabled
yum --disablerepo=kubernetes ...
4. Disable the repository permanently,
so yum won't use it by default.
Yum will then just ignore the repository
until you permanently enable it again
or use --enablerepo for temporary usage:
yum-config-manager --disable kubernetes
or
subscription-manager repos --disable=kubernetes
5. Configure the failing repository to be skipped,
if it is unavailable. Note that yum will try
to contact the repo. when it runs most commands,
so will have to try and fail each time
(and thus. yum will be be much slower).
If it is a very temporary problem though,
this is often a nice compromise:
yum-config-manager --save
--setopt=kubernetes.skip_if_unavailable=true

failure: repodata/repomd.xml from kubernetes:
[Errno 256] No more mirrors to try.
https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64/repodata/repomd.xml:
[Errno -1] repomd.xml signature could not be verified for kubernetes

I found the solution on this post. Basically, I rewrite the repo so that it gets the rpm package key only. Liket this:

[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=0
gpgkey=https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

After that, yum does the install and I can go ahead.

MAAS ERROR: psycopg2.OperationalError: FATAL: Ident authentication failed for user “maas” on CenOS 7.9

Just yesterday I was posting about how to install MAAS on CentOS 7.X. I ended up with a quite nice test web, but I was not able to go into production with it. To have a functional MAAS (not the dummy web) you need a database, and the weapon of choice for the MAAS seems to be postgresql. I’m not familiar with it, so I can’t tell you a lot about it. A database is a database, that they say in my village. I’m trying to start my MAAS with the below username and password I have no regret to copy here:

sudo -u postgres \
psql -c "CREATE USER maas WITH ENCRYPTED PASSWORD 'maas'
sudo -u postgres createdb maasdb -O maas

Yeah not very original. I took it from here. I modify the configuration (how to find it) on /var/lib/pgsql/data/pg_hba.conf as suggested, restart the postregsql service by systemctl restart postgresql and try to init the MAAS like this

# > maas init region+rack --database-uri
"postgres://maas:maas@localhost/maasdb"

Controller has already been initialized.
Are you sure you want to initialize again (yes/no) [default=no]? yes
MAAS URL [default=http://X.X.X.X:5240/MAAS]:
Failed to perfom migrations:ations
Traceback (most recent call last):
... bla bla bla...
psycopg2.OperationalError:
FATAL: Ident authentication failed for user "maas"

The above exception was the direct cause of
the following exception:
Traceback (most recent call last):
... bla bla bla ...
django.db.utils.OperationalError:
FATAL: Ident authentication failed for user "maas"

We have an ident user ID or permission problem. This happened to me before for mysql databases! So I think I know where to loo at, more or less. First at the service configuration, then to the user authentication itself. I look for a solution to the FATAL error and ending up leaving my pg_hba.conf like this:

# TYPE DATABASE USER ADDRESS METHOD
local all all peer
host all all 127.0.0.1/32 md5
host all all ::1/128 md5

Then I restart the service and test the login with user maas password maas:

# > psql -h localhost -U maas -d maasdb
Password for user maas:
psql (9.2.24)
Type "help" for help.
maasdb=> \q

So it works! Time to go for the MAAS init:

# > maas init region+rack --database-uri 
"postgres://maas:maas@localhost/maasdb"

Controller has already been initialized.
Are you sure you want to initialize again (yes/no) [default=no]? yes
MAAS URL [default=http://X.X.X.X:5240/MAAS]:
Failed to perfom migrations:ations
Traceback (most recent call last):
...bla bla bla...
maasserver.plugin.UnsupportedDBException:
Unsupported postgresql server version (90224) detected

Which means I need to uninstall postgresql and install a more advanced version. Which means I have the theme for the next post 😁😁. Well, that’s life!

GPU programming tests on CentOS 7

As a benchmarking fan, I was looking for a minimal code to run able to give us the difference between CPU and GPU calculations. I found it on this linuxhint article. At the bottom, you have two links for further reading: GPU programming with C++ and GPU programming with python. I’ve tried both on my CentOS 7.9 and a Quadro K5200. First step is to install the default CUDA:

yum install cuda

In my case, the above command installed cuda-11.7. I previoulsy had the drivers for the GPU installed. Unfortunately that seems not to be enough. You need the nvidia compiler, nvcc. Very easy to install if you don’t have it:

yum install nvcc

The compiler lands on /usr/local/cuda/bin/nvcc. You may want to modify your PATH variable to point to that folder also. I didn’t. Time to copy the sample code given above. I save everything in my project folder and type make as suggested.

gputest ## > make
/usr/local/cuda/bin/nvcc
-std=c++11 gpu-example.cpp -o gpu-example
gpu-example.cpp: In function
'void vector_add_gpu(int*, int*, int*, int)':
gpu-example.cpp:24:13: error: 'threadIdx'
was not declared in this scope
int i = threadIdx.x;
^
gpu-example.cpp: In function 'int main()':
gpu-example.cpp:64:22: error: expected primary-expression
before '<' token
vector_add_gpu <<<1, ITER>>>
(gpu_a, gpu_b, gpu_c, ITER);
^
gpu-example.cpp:64:32: error: expected primary-expression
before '>' token
vector_add_gpu <<<1, ITER>>>
(gpu_a, gpu_b, gpu_c, ITER);
^
make: *** [all] Error 1

In case it’s not clear, the code copied as-is didn’t work. I check the nvcc compiler is there (it is!) so it’s something else. I guess from this nvidia forum post the problem may be in the convention I’m using. I rename my file and adjust the Makefile accordingly

mv gpu-example.cpp gpu-example.cu

After that, make works and I get a binary that I can run. This is my output.

gputest ## > make
/usr/local/cuda/bin/nvcc -std=c++11 gpu-example.cu -o gpu-example
gputest ## > ls
gpu-example* gpu-example.cu Makefile
gputest ## > ./gpu-example
vector_add_cpu: 432218 nanoseconds.
vector_add_gpu: 19573 nanoseconds.

Later runs give me different numbers but in the same order: the GPU calculation is 20 times faster than the CPU one. Further analysis is needed 🧐

The systemd-analyze and daemon management on Linux

I feel like I need to write down this.I was fighting with a rebel docker process in a multi-purposed server so I was forced to review the services. First thing I did is to check the starting time of each service. You can find a detailed post in the geekdiary. This is done like this

systemd-analyze blame

I’m not going to copy my output, but I’ll say I didn’t find any clear culprit. Here you have the official systemd-analyze man page. Let’s say I really suspect of the docker daemon. Where’s the service file? We can cat the service like this (partial output):

root@bad ~ ## > systemctl cat docker
# /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target docker.socket firewalld.service containerd.service
Wants=network-online.target
Requires=docker.socket containerd.service

[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutSec=0
RestartSec=2
Restart=always
...

As you see, it gives you the location of the file as well as the content. Now I can change the timeout, or the restart options, or whatever I want. I go anyway for reinstalling the service completely (docker) and reviewing my docker monitoring code, so that it doesn’t make so many calls to the daemon, and after that my service lagging problem is gone. BTW my docker error is described here. Long story short, there was nothing wrong with the service but with my code. So review your code and test it as much as you can! πŸ˜‰

Internal Server Error after update : munin not available

Sorry I don’t write here as frequently as before but I’ve been busy. The scenario: we are forced to update everything to the latest kernel and to clean all the extra packages. I don’t do it so when I’m back to my server and restart my munin service on the web side I get this message

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator at root@localhost to inform them of the time this error occurred, and the actions you performed just before this error.

More information about this error may be available in the server error log.

I’m the server administrator, so I need to contact myself πŸ˜‚πŸ˜‚πŸ˜‚. First of course I restart all the services. (httpd, munin, munin-node). That has no effect, so stop the munin service, clean all the logs and start them again. The log shows errors but nothing strange:

/var/log/munin ## > tailf munin-update.log 
DATE [INFO] Reaping Munin::Master::UpdateWorker
<XX>. Exit value/signal: 0/0
DATE [ERROR] In RRD: Error updating XXX
GPU_TEMP-g.rrd: conversion of 'N/A'
to float not complete: tail 'N/A'

How about the httpd service itself? Let’s have a look:

/var/www/html ## > tailf /var/log/httpd/error_log
</p>
[DATE] munin-cgi-html:
Can't open /var/log/munin/munin-cgi-html.log
(Permission denied) at /usr/share/perl5/vendor_perl/Log/Log4perl/Appender/File.pm
line 103.
<h1>Software error:</h1>
<pre>Can't open /var/log/munin/munin-cgi-html.log
(Permission denied) at /usr/share/perl5/vendor_perl/Log/Log4perl/Appender/File.pm
line 103.
</pre>
<p>
For help, please send mail to this site's webmaster,
giving this error message
and the time and date of the error.
</p>

Again, oddly enough, the message recommends to contact myself 😁😁. That I do, of course. Is there an easy solution for this? Reinstall? Move to a docker version? No, it’s much easier. I stop all the munin services and clean the log. Then I do:

/var/log ## > chown apache:munin /var/log/munin/

and systemctl restart munin. And my munin monitoring is back. Fir the records,

/var/log ## > ls -lh
...
rwx------ 2 munge munge 4096 DATE munge/
drwxr-x--- 2 apache munin 4096 DATE munin/

Time for a break. A summer break? not yet 😩😩😩…