# OSX ipython UnknownBackend %matplotlib unable to use

I’m continuing with my data science experiments. If you are also following some text instead of learning it by use, you may have encountered that you are unable to use matplotlib as suggested on the text.  The line

%matplotlib inline

Producers a long dump that ends up with

UnknownBackend: No event loop integration for 'inline'.
Supported event loops are: qt, qt4, qt5, gtk, gtk2, gtk3, tk,
wx, pyglet, glut, osx

You can eventually ignore the inline command and save your plot “plt” using savefig.

In [47]: plt.savefig('scatter.png')

This will save your plot on the current folder where you run ipython as a png named ‘scatter’. But we don’t want to be saving and checking on each step. We want to see it first. The solution, as all the good solutions, is easy when you know it. Instead of:

In [41]: %matplotlib inline

You write:

In [42]: %matplotlib osx
In [43]: import matplotlib.pyplot as plt
In [44]: import seaborn; seaborn.set()
In [45]: plt.scatter(X[:, 0], X[:, 1]);  

Now your plots will display on a separated window. You’re welcomed 🙂

# A ddrescue on CentOS 7

Sorry no literature yet. I do not find time to write down my dreams, but they are there. I’ve been having headaches trying to recover a dying Windows 7 system with vital hardware running on it. Since I am a Linux person, first I try of course to do dd but I don’t manage even to read the disk. The Windows solution doesn’t seem to work neither: I tried analyzing the disk (Properties->Check) or repair disk sectors with CHKDSK and oddly enough, everything looks fine on the system disk. But I know the disk is damaged. There is a wikipedia entry about damaged disks reparation that I try:

dd if=/dev/old_disk of=/dev/new_disk conv=noerror,sync

The process finishes despite of the I/O errors but the cloned disk is useless, since I ignored the errors on the copy (noerror). They point to ddrescue as a solution, although from the text it looks like a Linux-only tool. Anyway, I want to try it. How do I run that, and how long it takes for a big disk?

 ## > ddrescue -f -n /dev/sdd /dev/sdc /root/rescue.log
GNU ddrescue 1.22
ipos: 1500 GB, non-trimmed: 0 B, current rate: 0 B/s
opos: 1500 GB, non-scraped: 1498 GB, average rate: 164 kB/s
non-tried: 0 B, bad-sector: 2048 B, error rate: 75302 kB/s
rescued: 1922 MB, bad areas: 4, run time: 3h 14m 40s
pct rescued: 0.12%, read errors: 22863560, remaining time: n/a
time since last successful read: 3h 14m 14s
Finished

I run it over a 1.5 TB disk, with the aim to end up with an usable clone. This first step took about 3 hours to be completed. Here is the meaning of all the ddrescue ouput, so you can check how long until it is done. By the way, it seems that there is no easy way to speed up the process. Now we try to copy only the errors with 3 retrials like this:

## > ddrescue -d -f -r3 /dev/sdd /dev/sdc /root/rescue2.log
GNU ddrescue 1.22
ipos: 256570 MB, non-trimmed: 0 B, current rate: 0 B/s
opos: 256570 MB, non-scraped: 1243 GB, average rate: 0 B/s
non-tried: 0 B, bad-sector: 256570 MB, error rate: 310 kB/s
rescued: 0 B, bad areas: 2, run time: 2d 21h 6m
pct rescued: 0.00%, read errors:524007765, remaining time: n/a
time since last successful read: n/a
Scraping failed blocks... (forwards)^C

Unfortunately, after 2 days, the disk copied is still not fully baked, and I give up. So I interrupt the run, and I go smartclt all over it. The output of the clone looks promising.

smartctl -a /dev/sdc
smartctl 5.40 2010-03-16 r3077 [x86_64-unknown-linux-gnu]
(local build)
Copyright (C) 2002-10 by Bruce Allen,
http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG series
Device Model:     SAMSUNG HD502HI
Serial Number:    S1VZJ9CS712490
Firmware Version: 1AG01118
User Capacity:    500,107,862,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:    Wed Feb  9 15:30:42 2011 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED


I happily go with my clone to the PC, put it in, and start the computer, with the hope of reaching a happy Windows 7 running system. I’m not lucky this time, but by using a combination of system recovery options over my clone that I’m not sure about it (so I don’t post them) I do end up with something that seems to be stable enough. Be sure that I will edit this post if it’s not the case! And if it’s the case, have a nice day 🙂

# The Standard Model (SM) versus the Generation Model (GM)

As a Particle Physicist, I was always fascinated by the Standard Model. All these quarks, all these quantum numbers, were so beautiful and at the same time so complicated.

On the SM we have six leptons: electron (e), electron neutrino (νe), muon (μ), muon neutrino (νμ), tau (τ) and tau neutrino (ντ). They are elementary particles and they were found when the SM was still not completely drawn. The elementary leptons are complemented by six quarks: up (u), down (d), charmed (c), strange (s), top (t) and bottom (b). In total, we have twelve particles all with spin-1/2 and a set of additive quantum numbers.

For me, they seem somehow arbitrary. The charge Q, lepton number L, muon lepton number Lμ, tau lepton number Lτ, baryon number A, strangeness S, charm C, bottomness B and topness T (see Original paper with the tables). For each particle additive quantum number N, the corresponding antiparticle has the additive quantum number − N. This model, although it didn’t fail so far, does not answer all the questions. There is no room on the SM for dark matter or dark energy now that the super-symmetric partners of the twelve guys on Table 1 are nowhere to be found. I’ve been in a lot of meetings where the dark matter and dark energy were discussed, and somehow in none of them I saw nothing but a rewrite of the standard story. The Generation Model, on the other hand, seems to be a fresh gauge approach, where by changing the point of view, one reaches a explanation for everything, including the dark matter. You can read the GM paper here, and judge by yourself. Basically, considering the twelve particles as composites of rishons, a lot of blanks are filled.

The so called  Harari-Shupe Model (HSM), uses only two rishons labeled T with charge Q=+1/3 and a property V with Q = 0 and their corresponding antiparticles labeled T¯¯¯ with charge Q=1/3 and V¯¯¯ with Q = 0 to construct the leptons, quarks and their antiparticles. What do we win by doing so? They claim that, by this approach, the matter-antimatter asymmetry is gone and that there is no need to include dark matter to explain the gravitational effects we see at long distance, that are currently explained by using dark matter halos.

Everything is much more beautiful with the GM. But I do still need to believe the math behind…and the experimental results of it, of course 😀

# Install Python 3 on OSX for data science

While I edit as a book The Water Wedding I keep working, of course. This is also my log. I’m following this instructions to build up a proper environment on my mac to test some data science tools. First, I open a terminal and I check my python version.

mymac:~ user$which python /usr/bin/python mymac:~ user$ python --version
Python 2.7.10
mymac:~ user$python3 -bash: python3: command not found mymac:~ user$  xcode-select --install

This opens up the Apple store and starts installing xcode. It takes some time. But it was expected. The next one is brew. This is my output:

mymac:~user$/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/
Homebrew/install/master/install)"
==> This script will install:
/usr/local/bin/brew
/usr/local/share/doc/homebrew
/usr/local/share/man/man1/brew.1
/usr/local/share/zsh/site-functions/_brew
/usr/local/etc/bash_completion.d/brew
/usr/local/Homebrew
==> The following existing directories will be made group writable:
/usr/local/bin
/usr/local/share
/usr/local/share/man
/usr/local/share/man/man1
==> The following existing directories will have their owner
set to user:
/usr/local/bin
/usr/local/share
/usr/local/share/man
/usr/local/share/man/man1
==> The following existing directories
will have their group set to admin:
/usr/local/bin
/usr/local/share
/usr/local/share/man
/usr/local/share/man/man1
==> The following new directories will be created:
/usr/local/etc
/usr/local/include
/usr/local/lib
/usr/local/sbin
/usr/local/var
/usr/local/opt
/usr/local/share/zsh
/usr/local/share/zsh/site-functions
/usr/local/var/homebrew
/usr/local/Cellar
/usr/local/Homebrew
/usr/local/Frameworks
==> Installation successful!
==> Homebrew has enabled anonymous aggregate formulae
Read the analytics documentation (and how to opt-out) here:
https://docs.brew.sh/Analytics
==> Homebrew is run entirely by unpaid volunteers.
https://github.com/Homebrew/brew#donations
==> Next steps:
- Run brew help to get started
- Further documentation:
https://docs.brew.sh

So far so good. Let’s test it.

mymac:~ user$brew doctor Error: You have not agreed to the Xcode license. Please resolve this by running: sudo xcodebuild -license accept mymac:~ user$  xcodebuild -license accept
mymac:~ user$sudo xcodebuild -license accept Password: Another test. mymac:~ user$ brew doctor
Your system is ready to brew.

We brew python3 now. Output also written as a reference. Colours are mine.

mymac:~ user$brew install python3 ==> Installing dependencies for python: gdbm, openssl, readline, sqlite and xz ==> Installing python dependency: gdbm ==> Downloading https://XXXbottle.1 ################################################## 100.0% ==> Pouring gdbm-1.18.1.mojave.bottle.1.tar.gz 🍺 /usr/local/Cellar/gdbm/1.18.1: 20 files, 586.8KB ==> Installing python dependency: openssl ==> Downloading https://homebrewXXX.mojave.bottl ################################################## 100.0% ==> Pouring openssl-1.0.2q.mojave.bottle.tar.gz ==> Caveats A CA file has been bootstrapped using certificates from the SystemRoots keychain. To add additional certificates (e.g. the certificates added in the System keychain), place .pem files in /usr/local/etc/openssl/certs and run /usr/local/opt/openssl/bin/c_rehash openssl is keg-only, which means it was not symlinked into /usr/local, because Apple has deprecated use of OpenSSL in favor of its own TLS and crypto libraries. If you need to have openssl first in your PATH run: echo 'export PATH="/usr/local/opt/openssl/bin:$PATH"'
>> ~/.bash_profile
For compilers to find openssl you may need to set:
export LDFLAGS="-L/usr/local/opt/openssl/lib"
export CPPFLAGS="-I/usr/local/opt/openssl/include"
==> Summary
🍺  /usr/local/Cellar/openssl/1.0.2q: 1,794 files, 12.1MB
...some stuff here...
==> Summary
==> Installing python dependency: sqlite
...some stuff here...
==> Summary
🍺  /usr/local/Cellar/sqlite/3.27.1: 11 files, 3.7MB
==> Installing python dependency: xz
...some stuff here...
==> Pouring xz-5.2.4.mojave.bottle.tar.gz
🍺  /usr/local/Cellar/xz/5.2.4: 92 files, 1MB
==> Installing python
...some stuff here...
==> Caveats
Python has been installed as
/usr/local/bin/python3
Unversioned symlinks python, python-config, pip etc.
pointing to python3, python3-config, pip3 etc.,
respectively, have been installed into
/usr/local/opt/python/libexec/bin
If you need Homebrew's Python 2.7 run
brew install python@2
You can install Python packages with
pip3 install <package>
They will install into the site-package directory
/usr/local/lib/python3.7/site-packages
See: https://docs.brew.sh/Homebrew-and-Python
==> Summary
🍺  /usr/local/Cellar/python/3.7.2_2: 3,861 files, 59.7MB
mymac:~ user$python3 --version Python 3.7.2 The same that goes on for openssl (the keychain problem and the export PATH problem) appears for readline, sqlite, and xz. I have edited the output so it’s not enormous. Now we install conda with pip3. mymac:~ user$ pip3 install conda

Installing collected packages: pycosat, certifi,
idna, urllib3, chardet, requests, ruamel.yaml, conda

Successfully installed certifi-2018.11.29 chardet-3.0.4
conda-4.3.16 idna-2.8 pycosat-0.6.3 requests-2.21.0
ruamel.yaml-0.15.88 urllib3-1.24.1

And finally, the data science packages. This way:

pip3 install numpy pandas scikit-learn matplotlib seaborn jupyter

It takes some time but we are ready to continue with the tutorial of the Python DataScience Handbook. We’ll see where we hit another stone…

# Impostor Syndrome

I heard about it on the movies, I read about it. Positioned in front of my mirror is not the same free spirit that started studying Physics, but a businessman pater familias. This is what I have become, little by little, step by step. I’m going through the (in)famous midlife crisis! My spark is lost, the joy is gone. I should be happy because I almost reached my goals, so why am I not? I have an office only for me, a more-or-less respected position on a top research institute, and a lot of ideas. But is this what I wanted to achieve?

Probably yes. So where is the catch? Why am I complaining right now? Maybe I should analyze what is missing on this moon that I managed to land on. Acknowledgement, I get from time to time. Not daily, that would be excessive, but it is not rare. Friends I have. Not as close as I’d like to, not as many, but I have. Freetime? I could make time, I have responsibilities of course but I could arrange it so that I make it if needed. What it is then? Envy maybe? The lack of clear goals, of clear challenges? Is it the fact that I’m not advancing on my career? I will say all of it, and none of them. Because I’m sure I can set up my goals, like learning quantum computation, and I will not be satisfied. And maybe that’s OK, that means I’m still human after all. The problem is, I believe, that I can’t digest what I became. Therefore the impostor syndrome. I still think of myself as a free spirit. Not an adult, with rules and tasks and a public image. I’m a cyborg but that’s OK 😀

# Windows batch script tips

I’m again on fire. I want to automatize data transfers on Windows. For that, I need to write a script. Instead of going for python, this time I will try with a Windows batch script. This is just a list of tips about how to achieve certain tasks…

Open a txt file and save it as test.bat on your Desktop. You should see it now as an executable (wheels), if not, check that the extension is really bat and not txt. This will be our batch script. Now we write in the file. Let’s start with the basic. For me, it is how to comment your code (::) , and output (echo) messages. This snippet should be self explanatory:

:: This will print a message on your command prompt
echo " Executing the bat file"

If you double click on test.bat you should see a flashing black Command Prompt window appearing. If we want the prompt window to stay, we just add pause at the end of the file.

Let’s get serious. We want to ask the user to input the folder path. This means declaring a variable and asking for keyboard input. This will do it:

SET /p choice= "Please type the name of the folder :"
:: echo '%choice%'

It could be useful to concatenate the variable with a default path ROOT.  In two lines:

SET ROOT=M:\DATA
SET SRC_ROOT=%ROOT%%choice%

We could print our new concatenated variable with echo. No need at this moment. Let’s look for a command to list all files in a folder as well as in subfolders. The line results on:

echo "Listing files in folder"
DIR /S %SRC_ROOT% /B /O:GN

We want to link the listed  files (mklink) so we have symbolic links onto another folder. In principle, what we want is to have the same tree structure but in another place. We may want that because we need to have the files “available” on a specific folder without copying them. On one line, we do it like this:

MKLINK /D C:\LinkToFolder %SRC_ROOT%

We have, of course, used our SRC_ROOT folder as imput. There are a lot of MKLINK examples on HowToGeek. We could, in principle, from the list of files, select those after a pattern, loop over the list, and link only those. But I will leave that to the reader 😀

# P-hacking

I’m not doing heavy processing but I’m surrounded by people doing it. They have terabytes and terabytes of data that they analyze with somehow black-box tools that you can tune with multiple parameters. Then you represent the results with different programs. On an email from Graphpad, one of the companies that sell statistical analysis programs, I was asked about the complexity of my data sets and about how frequently I encounter P-Hackings on my data.

P-Hacking says that, if you try hard enough, eventually ‘statistically significant’ findings will emerge from any reasonably complicated data set.

On other words, if you know what you are looking for (for example P<0.0.5) you can end up cooking up a procedure to find it. I consider myself a bit of a data scientist and I fully agree with the possibility of inadvertently influencing the output of your analysis. I think the biggest example of this is the electoral polls. They are being ordered by governments, and nobody wants to bite the hand that gives you food…in principle. How could I avoid P-Hacking if I’m a part of the analysis procedure? I believe the solution is easier than reviewing all the analysis steps : just ask someone else to analyze your data. The problem will be then what to do if the other gets a result significantly different from yours. And that the truth is in the eye of the beholder. Or something like that 🙂