CryoSPARC not starting after update to v2.8 on CentOS 7.X : bad timing interval

As usual, click here if you want to know what is cryosparc. I have created a cryosparc master-client setup. In principle I did update from v2.5 to v.2.8 successfully after running on a shell cryosparc update. It’s the standard procedure. I got updated all, master and clients. But after the update I rebooted everything. And after the reboot of the master node the problems started. This is the symptom:

cryosparcm start
Starting cryoSPARC System master process..
CryoSPARC is not already running.
database: started
command_core: started

And the starting hangs there. The message telling you  where to go to access to your server is not appearing. Of course I waited. The status looks like this:

cryosparcm status
--------------------------------------------------
CryoSPARC System master node installed at
/XXX/cryosparc2_master
Current cryoSPARC version: v2.8.0
----------------------------------------------
cryosparcm process status:
command_core                     STARTING 
command_proxy                    STOPPED   Not started
command_vis                      STOPPED   Not started
database                         RUNNING   pid 49777, uptime XX
watchdog_dev                     STOPPED   Not started
webapp                           STOPPED   Not started
webapp_dev                       STOPPED   Not started
------------------------------------------------
global config variables:
export CRYOSPARC_LICENSE_ID="XXX"
export CRYOSPARC_MASTER_HOSTNAME="master"
export CRYOSPARC_DB_PATH="/XXX/cryosparc_database"
export CRYOSPARC_BASE_PORT=39000
export CRYOSPARC_DEVELOP=false
export CRYOSPARC_INSECURE=false

It looks like in this cryosparc forum post. Unfortunately no solution is given there. We can check what the log webapp is telling also:

 cryosparcm log webapp
    at listenInCluster (net.js:1392:12)
    at doListen (net.js:1501:7)
    at _combinedTickCallback (XXX/next_tick.js:141:11)
    at process._tickDomainCallback (XXX/next_tick.js:218:9)
cryoSPARC v2
Ready to serve GridFS
events.js:183
      throw er; // Unhandled 'error' event
      ^
Error: listen EADDRINUSE 0.0.0.0:39000
    at Object._errnoException (util.js:1022:11)
    at _exceptionWithHostPort (util.js:1044:20)
    at Server.setupListenHandle [as _listen2] (net.js:1351:14)
    at listenInCluster (net.js:1392:12)
    at doListen (net.js:1501:7)
    at _combinedTickCallback (XXX/next_tick.js:141:11)
    at process._tickDomainCallback (XXX/next_tick.js:218:9)

It looks like a java problem (EADDRINUSE stands for address in use). So which java process is creating the listening error?

I clean up as suggested on this cryosparc post,  or on this one, deleting the /tmp/ and trying to find and kill any supervisord rogue process. That I don’t have. Next I reboot the master but the problem persists. Messing up with the MongoDB does not help also. What now? The cryosparc update installed a new python, so I decide to force the reinstall of the dependencies. It is done like this:

cryosparcm forcedeps
  Checking dependencies... 
  Forcing dependencies to be reinstalled...
  --------------------------------------------------
  Installing anaconda python...
  --------------------------------------------------
..bla bla bla...
 Forcing reinstall for dependency mongodb...
  --------------------------------------------------
  mongodb 3.4.10 installation successful.
  --------------------------------------------------
  Completed.
  Completed dependency check. 

If I believe what the software tells me, everything is fine. I reboot and run cryosparcm start but my “command core” still hangs on STARTING. After several hours of investigation, I decide to take a drastic solution. Install everything again. Then I find it.

 ./install.sh --license $LICENSE_ID \
--hostname sparc-master.org \
--dbpath /my-cs-database/cryosparc_database \
--port 39000
ping: bad timing interval
Error: Could not ping sparc-master.org

What is this bad timing interval? I access to my servers via SSH + VPN, so it could be that the installer can’t handle the I/O of such a load, or the time servers we use, or something. Or maybe is that the Java versions differ? In any case, I approach to it on another way. I need to be closer. How to?

I open a virtual desktop there and in it, I call an ubuntu shell where I run my installer. Et voila! bad timing gone. And the install goes on without any further issues. Note that I do a new install using the previous database (–dbpath /my-cs-database/cryosparc_database so that everything, even my users, are the same than before 🙂

Long story short: shells may look the same but behave differently. Be warned!

Advertisements

CryoSPARC 2 slurm cluster worker update error

This is about CryoSPARC again. Previously we did install it on CentOS and update it, but on a master + node configuration, not on a cluster configuration. If it’s a new install on your slurm cluster, you should follow the master  installation guide, that tells you to make a master install on the login node, then, on the same login node install the worker:

module load cuda-XX
cd cryosparc2_worker
./install.sh --license $LICENSE_ID --cudapath 

The situation is that we update the master node but the Lane default (cluster) doesn’t get the update and the jobs crash because of it. First we uninstall the worker using one of the management tools like this:

cryosparcm cli 'remove_scheduler_target_node("cluster")'

Then we cryosparc stop and we move the old worker software folder

mv cryosparc2_worker cryosparc2_worker_old

and get a new copy of the worker software with curl.

curl -L https://get.cryosparc.com/\
download/worker-latest/$LICENSE_ID \ > cryosparc2_worker.tar.gz

We cryosparc start, and untar, cd, and install. Don’t forget to add your LICENSE_ID and to load the cuda module or be sure you have one cuda by default. This is an edited extract of my worker install:

******* CRYOSPARC SYSTEM: WORKER INSTALLER ***********************

Installation Settings:
License ID :  XXXX
Root Directory : /XXX/Software/Cryosparc/cryosparc2_worker
Standalone Installation : false
Version : v2.5.0

******************************************************************

CUDA check..
Found nvidia-smi at /usr/bin/nvidia-smi

CUDA Path was provided as /XXX/cuda/9.1.85
Checking CUDA installation...
Found nvcc at /XXX/cuda/9.1.85/bin/nvcc
The above cuda installation will be used but can be changed later.

***********************************************************

Setting up hard-coded config.sh environment variables

***********************************************************

Installing all dependencies.

Checking dependencies... 
Dependencies for python have changed - reinstalling...
---------------------------------------------------------
Installing anaconda python...
----------------------------------------------------------
PREFIX=/XXX/Software/Cryosparc/cryosparc2_worker/deps/anaconda
installing: python-2.7.14-h1571d57_29 ...

...anaconda being installed...
installation finished.
---------------------------------------------------------
Done.
anaconda python installation successful.
---------------------------------------------------------
Preparing to install all conda packages...
-----------------------------------------------------------
----------------------------------------------------------
Done.
conda packages installation successful.
------------------------------------------------------
Preparing to install all pip packages...
----------------------------------------------------------
Processing ./XXX/pip_packages/Flask-JSONRPC-0.3.1.tar.gz

Running setup.py install for pluggy ... done
Successfully installed Flask-JSONRPC-0.3.1 
Flask-PyMongo-0.5.1 libtiff-0.4.2 pluggy-0.6.0 
pycuda-2018.1.1 scikit-cuda-0.5.2
You are using pip version 9.0.1, 
however version 19.1.1 is available.
You should consider upgrading via the
 'pip install --upgrade pip' command.
-------------------------------------------------------
Done.
pip packages installation successful.
-------------------------------------------------------
Main dependency installation completed. Continuing...
-------------------------------------------------------
Completed.
Currently checking hash for ctffind
Dependencies for ctffind have changed - reinstalling...
--------------------------------------------------------
ctffind 4.1.10 installation successful.
--------------------------------------------------------
Completed.
Currently checking hash for gctf
Dependencies for gctf have changed - reinstalling...
-------------------------------------------------------
Gctf v1.06 installation successful.
-----------------------------------------------------------
Completed.
Completed dependency check.

******* CRYOSPARC WORKER INSTALLATION COMPLETE *****************

In order to run processing jobs, you will need to connect this
worker to a cryoSPARC master.

****************************************************************

We are adding a worker that was somehow previously there, so I don’t do anything else. If I check the web, Lane default (cluster) is back. Extra tip: the forum entries about a wrong default cluster_script.sh, and about the Slurm settings for cryosparc v2.

If I need to add something: be aware that the worker install looks like coming with its own python, and it does reinstall cftfind and gctf. So be careful if you run python things in addition to cryosparc 🙂

Windows 7 Update error 8007000e

snap227Back to work, I found out that one of the Windows 7 PCs that run VIP hardware crashed like the old computer that is indeed. Windows 7 is already 10 years old, but it’s still the weapon of choice for VIP hardware. The reason is clear, the development and upgrade of the drivers (and the software) is not an easy candy for the companies. I understand it very well.

Anyway, time for a Windows 7 complete reinstall in a computer that was previously running it already. Fortunately I keep the original disk and the original Product Key. I remove the useless old disk (it’s not even readable, and it makes funny noises when plugged) put a new SSD and go through the easy Windows 7 install. I give it the same PC name, and I configure the same user, so that there are no troubles for the hardware clients. After typing the Product Key, the total installation time is around one hour. Not so much. I can’t comment if it’s a long or a short time, I don’t have too much data about. It’s when I try to update the system to the latest patch that I find two issues.

Issue number one: the Product Key I entered during the install was not taken at the end as a valid one. Why is that? the hardware (that is check to validate the key) is the same. Maybe it was not taken at installation time? I try giving it in again and it works.

Issue number two: the above windows update error. I follow the solution that you can find here. The picture is also from there. The problem is, I can’t even open the browser to download the IE11! What now? I do manage to download an install Chrome, and with it, I get the desired IE11. Unfortunately after step 9 (the last reboot) I still get the update error. On the other hand, if I do afterwards what I found here, it works! So my recipe to get rid of the error is as follows:

  1. Install IE11. I got it from Chrome.
  2. Remove the two hot fixes KB2534111, KB2639308
  3. Download KB3102810, reboot, install the patch.
  4. Stop the Windows update service via services.msc, restart it.

Of course this is my solution, so try it under your own risk 🙂

Code 43 Error: Windows has stopped this device

img_57d76ca3c806eAy ay ay Windows! It is so nice but so picky… sometimes it doesn’t let you work with devices that were not with it from the very installation time. I found this Code 43 Error when I decided to add to my original server installation (that I performed myself after a lot of hassle with the fans) three partially new GeForce 1080 Ti. Since the server was not new, neither the cards, I googled quickly the error.

And here is the solution to the issue from drivereasy.com, from where I took the pic also. Basically it involves deinstalling the device and scanning for hardware changes. Unfortunately, the solution was not lasting for a long time, and I was forced to do it again and again, until I found out the source of my problem. That I will share with you. The server originally was not designed to consume so much power, so the issue was that one of the cards was not getting enough energy. Instead of filling up all the PCIe slots, I removed some things…until the system was able to cope with my 3 GPUs. Luckily I had extra PCIe cards in! Ay ay ay…

CryoSPARC 2 management notes

You know how to install CryoSPARC 2 on CentOS 7. There are some simple commands that I keep repeating, and some complicated ones that I need to run from time to time. I keep adding and deleting nodes.

To add a worker cryo01 with ssd on /data to master cryomaster
cryosparcw connect --update --worker cryo01
--master cryomaster --ssdpath /data
To add a worker cryo01 to a new line fastlane
bin/cryosparcw connect --worker cryo01
--master cryomaster --ssdpath /data
--lane fastlane --newlane
Here (chapter 2.7) you can learn how to select a lane.
To remove tjhe lane fastlane (see post here)
cryosparcm cli "remove_scheduler_lane('fastlane')"
To remove the worker
cryosparcm cli 'remove_scheduler_target_node("sparc0")
To create an user
cryosparcm createuser
--email user@domain.edu --password CLEARTEXT-PASSWORD
--name "John Doe"
To remove an user (see post here)
cryosparcm icli
db['users'].delete_one({'user.0.domain':})

So user and worker management is kind of tricky. I look forward for the version that will let us manage directly resources and working load. I’ll keep you posted 🙂

Giving root power to a CentOS 7 user

This is an old one. I was explicitly avoiding to pass through this hole, but the time has come. There is the need to run a script that will copy data owned by ROOT from storage A to storage B.  We don’t want to change the permissions or data ownership, neither we want to run it on a crontab. Solution: allow the normal user to run the script as root. It is not so complicated if you know how to do it.

We have tested the script, and it runs fine as ROOT. I will place the script on /home/admin/bin/myscript.sh, that is accessible only for root. What the script does is irrelevant for the post. It could be a simple copy or rsync. In my case, it checks that the folder is properly named, that the data is not currently being transferred, and that the data folder is not existing already. Once we are happy with the script, we simply type visudo as root on the computer of choice for the data transfer task. We will see a file filled with explanations that is physically placed on /etc/sudoers. IMPORTANT: you need to edit it with visudo or your changes will not work!

Let’s say we want to let alpha and beta users run myscript.sh. Both are AD users, by the way. However, we give them access only from one machine, where we open visudo. We can edit the file like we do with vi, pressing i (from insert) and wq to write and quit. At the end of the sudoers file we add

alpha ALL= NOPASSWD: /home/admin/bin/myscript.sh
beta ALL= NOPASSWD: /home/admin/bin/myscript.sh

We save the file and test that it works as it does as root. Obviosuly this is not the most effective way if we want a lot of people to run our script, but in principle, we don’t want a lot of people moving data around. Or do we?

Install EMAN2.2 with cmake 3.13 on CentOS 7

I’m not going to bloat the post with personal opinions about how good is the installation method is. Here you get the EMAN2 sources. This is the HOWTO install , and from there I did the compilation with anaconda. As you may know from previous posts, we use modules. First we load our python module and upgrade conda like suggested on the error message, if you get it. Then we go ahead with the instructions. I do that on a new window.

module load python-2.7.13
conda install eman-deps=13 -c cryoem -c defaults -c conda-forge

In my case, 27 new packages are installed, 14 packages are updated, 16 packages are downgraded, meaning around 40 packages are downloaded and extracted. Since we do that over the python modul, (I hope) I do need to do this only once. It takes like 10 minutes to be done, but multiply that by 30!  Now I cmake what I got from github. As expected, I get a cmake version error.

eman22-build ## > 
cmake ../eman22-src/ -DENABLE_OPTIMIZE_MACHINE=ON
CMake Error at CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
CMake 3.9 or higher is required. You are running version 3.8.2

-- Configuring incomplete, errors occurred!

Let’s donwload then a new cmake and make a module for it. I get the binary cmake for my linux, the cmake-3.13.3-Linux-x86_64.tar.gz, so I don’t need to compile it. The unzipped folder I place on /network/cmake-3.13.3/. My module looks like this then:

## modules cmake-3.13.3
## modulefiles/cmake-3.13.3. Sample gcc module
proc ModulesHelp { } {
global version modroot
   puts stderr "cmake-3.13.3"
}
module-whatis "Sets the environment for using cmake-3.13.3"

# for Tcl script use only
set topdir /network/cmake-3.13.3
set version 3.13.3
set sys linux86

setenv CMAKE_V "3.13.3"

prepend-path PATH $topdir/bin
prepend-path MANPATH $topdir/man
prepend-path LD_LIBRARY_PATH $topdir/lib

Time to cmake again. I do it on a new terminal. First I load both modules, my python and my cmake. Then I just follow the instructions…

cmake ../eman22-src/ -DENABLE_OPTIMIZE_MACHINE=ON

And make -j 4, make install. OK! The final module (eman22 installed on /network/eman22-build) looks like this:

#################
## modules eman22-2019.1
## modulefiles/eman22-2019.1 EMAN module
##
proc ModulesHelp { } {
global version modroot
   puts stderr "eman22-2019.1"
}
module-whatis "EMAN 22 2019.1"
# for Tcl script use only
set topdir /network/eman22-build
set version 2019.1
set sys linux86

# additional env variables 
set emandir /network/eman1.9
set pythondir $topdir/lib

setenv EMAN2_V "2019.1"
setenv HDF5_DISABLE_VERSION_CHECK 1
setenv EMAN2DIR $topdir
setenv EMANDIR $emandir
setenv PYTHONPATH $pythondir

prepend-path PATH $topdir/include
prepend-path PATH $topdir/bin
prepend-path MANPATH $topdir/man
prepend-path LD_LIBRARY_PATH $topdir/lib:/usr/lib
64

module load python-2.7.13

The first test, loading the module as an user, and running e2version.py, tells me what I want to see.

user@computer ~/test $ > module load eman22-2019.1 
user@computer ~/test $ > e2version.py 
EMAN 2.22 final (GITHUB: 2019-01-23 15:14 - commit: 04f6f33 )
Your EMAN2 is running on: Linux-3-XXXXX-YYYYY-ZZZZ
Your Python version is: 2.7.14

We are done! See you on the other side, soon, I promise…