Force Mellanox ConnectX-4 Lx 25Gbps to 10Gbps speed

Introduction

As you might remember I wrote a blog post about SFP+ and SFP28 compatibility. In this i discuss future proofing your network investments and not having to upgrade everything all at once. One example is that buying 25Gbps NICs when your main network infrastructure is still on 10Gbps is not an issue. 25Gbps normally handles 10Gbps well so you don’t have do replace all parts in the fabric at the same time but you can start with either the network fabric or the server NICs. It’s a way of future proofing the investments you make today.

When installing Mellanox ConnectX-4 Lx 25Gbps NICs in a bunch of servers we hit an issue when connected them to the DELLEMC N4000 10Gbps switches. The intent is to replace these with 25/50/100Gbps in the future.

The links did not come up.

The links did not come up. The switch ports are normally forced 10 Gbps in our setups so we check that. The speed was indeed set fix to 10Gbps. When changing that to auto-negotiate the link would come up at 1Gbps.

Naturally you check everything from cabling to used transceivers (BCM84754 on the switches) but that all checked out. We also check the firmware on the switches to determine if they were up to date and perhaps a new version fixed a known issue related to this. But no hardware wise everything was up to date on the switches and on the NICs.

Note that these links worked fine when used with 10 Gbps cards like the ConnectX-3 Pro. The DELL branded transceivers on the switches were BCM84754 (Broadcom)

The fix: Force Mellanox ConnectX-4 Lx 25Gbps to 10Gbps speed

I do not need to tell you that when you want 10Gbps getting 1Gbps doesn’t fly well. The fix was easy enough. We put the switch ports back to 10Gbps fixed speed. Auto-negotiate doesn’t deliver. No worries we fix the ports anyway. We then used mlc5cmd.exe Mellanox tool to change the NIC ports from auto-negotiate to fixed.

On hosts with Mellanox Connect-X4 NICs you open an elevated command prompt.

Navigate to C:\Program Files\Mellanox\MLNX_WinOF2\Management Tools. Run the below command to check the current link speed.

mlx5cmd.exe -LinkSpeed -Name “MyNicName ” -Query

Note 10 and 25 Gbps are supported, so it’s autonegotiate.

We force the link speed to 10Gbps:

mlx5cmd.exe -LinkSpeed -Name “MyNicName ” -Set 10

Link speed is forced to 10Gbps

The link comes up at 10Gbps

Likewise you can force the link to 25Gbps. If you want to change it back to the default you can force the link speed to auto-negotiate again.

mlx5cmd.exe -LinkSpeed -Name “MyNicName” -Set 0

See https://community.mellanox.com/s/article/mlx5cmd-exe for more information on this tool.

Do note that the switch port also needs to be set to 10Gbps fixed. As you can see below the command will notify you when those are still on auto.

The change was done but still no uplink when the switch port isn’t fixed to 10Gbps.

Conclusion

So my statement hold true the path to 25/50/100Gbps is one you can do step by step with future proofing. You might run into some issues but these are normally fixable. I have shared with you how to fix failing or wrong speed negotiations on 25 Gbps RDMA NICs (Mellanox ConnectX-4 Lx) when connecting to 10Gbps ports. I’m pretty sure the same holds true for other models. I have also had cards where things work out of the box but don’t give up when you hit an issue. I hope this helps some of you out there.

Replay Manager Configure Server There was an error loading the configuration information.

Replay Manager Configure Server There was an error loading the configuration information

When Replacing a bunch of servers with new DELL R740s (Hyper-V clusters, File clusters, backup targets etc.) I ran into an issue with the DELL Replay Manager software. The servers leverage multiple DELL EMC Storage Center SANs. The have multiple ones for Scale-Out, Redundancy, Failover, Mutliple Datacenters, …

With some of the servers I noticed that the loading of the information was slow, while most others were just fine. But with 4 out of all servers the connection never actually happens. The connectivity was just fine, and test connectivity confirmed this. As this had zero impact on the actual replays that were scheduled this went unnoticed. But when you are adding and removing servers you might need to dive into Server Configuration and that were after a minute we got the below error thrown

Configure Server
There was an error loading the configuration information.
Error Message:
The request channel timed out while waiting for a reply after 00:01:00. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout
.

Notice that the GUI says connecting to our demo server82… but unless you need info from the server you might still see the info it get’s from the Storage Center SAN itself.

This is quite annoying as we need to be in there. So how to fix this. I have some ideas as I know this error from .NET WCF but in this case I was looking for an easier way out especially when I don’t have all the information about this 3rd party application. The good news is that it is easily fixed.

Fixing this

Replay manager stores the replays and metadata info about those replays it creates on the SAN itself. That’s why you can still see those even when you actually ca’t connect to the server. The config of servers you add and use in Replay Manager is stored locally where the client lived. This files is portable, just copy it form your profile and had it to a colleague. No big deal.

Now the server configuration you do from the Replay Manager GUI tool itself is stored on each and any server where you have the Replay Manager service installed. You will find that file, ReplayManager.config.xml, under C:\ProgramData\Compellent\ReplayManager.

Make a copy to be sure and edit the original using a text editor that has elevated permissions so you can save your changes. In the example file of one server below note that server82 (green) has 2 old Compellent SC entries (yellow) that are no longer in service. One SAN it cannot find won’t exceed the time-out windows, but it does slow the GUI down significantly. 2 or more phantom old SAN slow things down looking for them and you get the time out error.

The fix is easy, cut the key values out of the file and save the file. You then restart the Replay manager service on that server via an elevated command prompt (or use the GUI):
net start ReplayManager
net stop ReplayManager

Restart the Replay manager service on the server you need to manager before connecting to the server again with the Replay manager client tool GUI.

When you now close and launch the Replay Manager GUI and connect to the server things will be a lot faster and certainly wont time out anymore.

Conclusion

Maintain your environment. Try to remove and decommissioned storage center SAN from your server configurations in Replay Manager before you take it off line an dispose of it.I f you forget you and run into slow loading Replay Manager GUI or hit a time out. Don’t panic. The Replay manager is actually quite solid and recoverable. We have shown you how to fix this by editing the ReplayManager.config.xml file on the server you need to connect to but can’t.You basically just remove the references to the no longer existing storage centers I hope it helps some of you out there if you run into this. Feel free to reach out in the comments if you have any questions.

Dell Storage Manager Collector Update error: Error applying transforms. Verify that the specified transform paths are valid.

Introduction

This is a quick assist for those people who run into the following error when updating their DellEMC SC Series Dell Storage Manager Data Collector and/or Client.

Error applying transforms. Verify that the specified transform paths are valid.

It will want to find a path to 1033.msi in your users’ profile appdata local temp folder but it is not there. Only different ones.

When trouble shooting this error Google might lead you to use various app cleaner tool or the like. This could work or not. It can also lead to new errors. The installer might now complain that updating is only for installed apps and require your to really uninstall the application. This could leave you with a non functional application until you fix the mess.

The easy fix

The solution is easier. Just navigate to the following key in the Windows registry:

COMPUTER\HKEY_CLASSES_ROOT\Installer\Product

Their you find for the key for the Dell Storage Manager Client and/or the Dell Storage Manager Collector. There you will find a Transforms value with the path that throws you the error. Just delete that  the value in that key.

Dell Storage Manager Collector

image

Dell Storage Manager Client

image

Now run your Dell Storage Manager Data Collector and/or Client installers again and things should go well. As always, take a VM checkpoint or another type of backup before you do any work on production server or at least exports the keys you modify so you can restore them

Upgrading MySQL 5.7.21 to 8.0.11

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.6

The process of Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.6 is actually quite easy (official MySQL documentation). And as I maintain my WordPress version, plugins regularly that’s not big of a deal for me.

Prepare the upgrade

  1. Backup your VM
  2. Backup your MySQL database
  3. Verify the restores work
  4. I also copy the data folder (in my case C:\MySQLDataFolder\Data) which I keep separate from the MySQL installation files as it helps me with upgrades. The only things that are in the C:\Program Files\MySQL\MySQL Server X.Y folder (I use the x64 bits) are the MySQL application files and the my.ini file.
  5. Create the C:\Program Files\MySQL\MySQL Server 8.0 (x64 bit version of MySQL, otherwise use the C:\Program Files (x86) folder). Copy the content of the  zip file with MySQL files and folders in there.

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

I then copy the my.ini file form the current installation (C:\Program Files\MySQL\MySQL Server 5.7. into the C:\Program Files\MySQL\MySQL Server 8.0 as well. It might be necessary to edit this file a bit more later but I start out with and exact copy and one change to point to the new basedir:  5.7 becomes 8.0 in this case.

As you notice, I don’t keep the MySQL Data and the ini file in the ProgramData folder on windows. It’s fine to leave it all there, if you prefer that.

# Path to installation directory. All paths are usually resolved relative to this.
basedir=”C:/Program Files/MySQL/MySQL Server 8.0/”

Perform the upgrade

  • Stop the MySQL Service:NET STOP MYSQL
  • I the remove the service form the OS:
    mysqld –remove
  • Install the MySQL service again, now with the new version path. As you see I explicity specify the default name of the service as MySQL and point it to where my ini file lives so I know which ini files this service uses.
    “C:\Program Files\MySQL\MySQL Server 8.0\bin\mysqld.exe” –install MySQL –defaults-file=”C:\Program Files\MySQL\MySQL Server 8.0\my.ini”
  • Start the MySQL Service
    NET START MYSQL

If all goes well that’s it, your new MySQL version is running. If so we, can jump to the part where you run the upgrade command to upgrade the system and user databases.  You can verify all went well in the error log. The name (mine is called WORKINGHARDINIT.err) as defined in the my.ini and is to be found under the data folder. Any problems will be logged there as well. This approach makes it easy to go back if the service won’t start as all files of the previous MySQL install are still there and you just have to install it as a service again.

Most common issues I have seen

My.ini file mistakes

The things that go wrong the most often and cause the MySQL service not to start -based on some of the support I have given to some people (including myself) are the following: certain options in your ini file are not compatible with the MySQL version you just installed.

Specifically for MySQL 8.0 make sure you comment out query_cache_size=0 (put a # in front of it) or remove the entry from the my.ini file.
#query_cache_size=0

If not the MySQL service won’t start. The error logged is:
[ERROR] [MY-011071] [Server] unknown variable ‘query_cache_size=0’

Next to that if you have the sql-mode entry in there this some times causes issues, so comment out that line as well. at least remove the offending entry, which might take some trial and error.
# sql-mode=”STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION”

As you can see the error log is the pointer to many issues with the MySQL service failing to start so do look there, that’s how you find what to do.

Data file ibdata1 is not writable

Sometimes you can see an error like this:
ERROR] [MY-012271] [InnoDB, InnoDB: The innodb_system data file ‘ibdata1’ must be writable

The 2 most common reasons are that the file is locked because you started mysqld.exe manually as you didn’t close a process you don’t know about/ cannot find that accesses it will also cause this error. A restart is then normally the easiest solution.

Sometimes people run into this due to permission problems (but with an upgrade this should not really happen). The MySQL service account (the default or the one you created and assigned) the need full control over the data folder. Check that.

Thinking you lost your blog

Another issue might be that MySQL is running but WordPress can’t find your blog. This might be 1 or more missing files such as ibdata1in the data folder root (hence I always make a copy of the entire data folder before I start an upgrade for safe keeping). The service might start but WordPress might not find your blog. In that case you’ll see a lot o warning similar to this in the error log:

Warning] InnoDB: Cannot open table wordpress592/wp_options from the internal data dictionary of InnoDB though the .frm file for the table exists.

So if these files are gone or corrupted, place them back from your spare copy or grab them from a backup. Most often these files are gone because some bad advise to delete these to fix a permission issue or a mistake. If they are not there they are recreated with the correct permissions but that causes the above issue.

Once you know the basics and you are careful, an upgrade process is mostly straightforward for most IT enthusiasts. Once MySQL is up and running and you can access your wordpress database again it’s time for the last step. Upgrade the databases.

Perform the MySQL upgrade run

Finally whenever you have done an upgrade don’t forget to run the mysql_upgrade.exe. This will take care of any upgrades needed to you system and user databases. Until you do you’ll see in the error log. If you don’t look there you might not even notice much but it pays to complete the complete upgrade process.

To do so, from an elevated command prompt navigate to C:\Program Files\MySQL\MySQL Server 8.0\Bin and run mysql_upgrade.exe-u root –p
Then enter the password and the upgrade process will kick off. This takes a while and it also depends on the amount of work the script has to do.

It 1st deals with the system database, when it finds out of date issues it will take care of those like in this case the sys schema.

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

Next it checks the user databases (in my case the word press database),

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

…  not much work to do here clearly.

Upgrading MySQL 5.7.21 to 8.0.11 for WordPress 4.9.5

That’s it. You’re all up to date. I normally do this about once or twice per year to make sure the blog server is up to date (performance, security, capabilities) & I don’t introduce tech debt and potentially more involved and risky upgrade scenarios. With a free community edition, that’s a zero cost game,bar a little effort on your part.

As a final note, when everything has proven to be up and running as it should do some housekeeping and clean up the old files / folders you no longer need of previous MySQL version installs.