We have a new bunch of these guys at home. They are really gorgeous machines: you can get a GPU server + data together for a handful of thousands, excluding the price of the GPU, the RAM, and the hard disks. The connectivity is great, also they look clean when you open them, not like some SysGen servers I saw (I don’t want to give names) that despite of the final performance, they look cramped with components, heavy and noisy. The finishing is in plastic, except the cover, but even so, it looks beautiful.
The user experience is very good, also. But I don’t want to sell you one. What I want to sell you is the remote management they have. There is an option that, once filled, allows the machine under warranty to “call home” when it has a problem. If you allow that, the customer service of your area will contact the person in the form and tell him what to do to repair the server, even if you didn’t realize the problem was there. What the hell, they even sell you the spare part, if you specify you are brave enough to exchange it (for example a RAM module). So far so good. The problem is, the issue is not always solved, and since IBM has a big customer support center, it could be that your tickets are crossed in the cyberspace or something, and you start repairing something that is not anymore giving you errors. Yes, complaining is for free. The fact is, one of our servers had the motherboard exchanged by a technician a week ago, and despite I think he did everything right, the server is not yet on a fully operational state. That is, it doesn’t work. And I’m still exchanging tickets with the support.
If this is happening to only one of 300 servers, that is fine, but I don’t handle the statistics. We will see, I will edit this post when the problem is solved.
EDIT: problem solved. A technician came and replaced the rising cards and one CPU. All of that after replacing already 3 RAM modules and the motherboard. But if it works, it works. Interesting was the way of collecting logs. I was asked to run a script, called
The output looks like:
Lenovo Dynamic System Analysis (C) Copyright Lenovo Corp. 2004-2015.(C) Copyright IBM Corp. 2004-2015. All Rights Reserved. [...something here...] Extracting... Executing... Logging level set to Status Copying Schema... cp: cannot stat ‘/etc/sysconfig/network/*-usb*’: Not a directory cp: cannot stat ‘/etc/sysconfig/network-scripts/*-usb*’: No such file or directory Dynamic System Analysis Version 10.2.A5Z (C) Copyright Lenovo Corp. 2004-2015. All Rights Reserved. (C) Copyright IBM Corp. 2004-2015. All Rights Reserved. Running DSA IMM plug-ins pass 1. IMM: Integrated Management Module Collector Running DSA IMM plug-ins pass 2. IMM: Integrated Management Module Collector Running DSA IPMI plug-ins pass 1. Running DSA IPMI plug-ins pass 2. Running DSA collector plug-ins pass 1. ... some stuff here... ... Adding DSA log entries to XML file. Writing XML data to file /var/log/Lenovo_Support/SOMETHING_LONG.xml.gz DSA capture completed successfully. cp: failed to access ‘/etc/sysconfig/network/’: Not a directory cp: cannot stat ‘*-usb*’: No such file or directory Please press ANY key to continue ...
I don’t know if the output of this was useful or not. But I’m happy my cluster is complete again 🙂