Finally: systemd-networkd to the rescue

People like me, working with different Linux distributions and automation, are always looking for ways to simplify and bridge the different configuration styles of system configuration into a unified way. Up to a point where it does not matter if you prefer Ubuntu, Debian, RedHat, CentOS or what ever your choice of Linux OS is. Finally systemd comes to the rescue to solve the network configuration issue using the systemd-networkd manager.

So how can you manage network configuration using systemd-networkd ?

First check if you actually have it installed and running with

systemctl status systemd-network

If the service is not enabled, just enable it after you have added your interfaces.

To configure interfaces, or more precisely networks in systemd, you only need to add a config file with a .network suffix. In my case /etc/systemd/network/ens33.network:

[Match]
Name=ens33

[Network]
DHCP=ipv4

[Address]
Address=10.0.2.15/24

[Address]
Address=10.0.3.15/24

The example above enables DHCP (v4) on the network interface ens33, a VMWare interface and yes I run VMWare on my MacBook, while additionally adding secondary IP addresses for haproxy testing purposes.

Once the configuration is completed, enable and restart the systemd-networkd service:

systemctl enable systemd-network
systemctl restart systemd-network

The networkctl command can now be used to monitor the lifecycle of an network:

Pretty cool, right! Finally one network manager to rule them all.

More information can be found at the systemd-networkd man page, documenting many more available options via http://man7.org/linux/man-pages/man5/systemd.network.5.html

How much GPU RAM does ETH Mining use?

So I had this question going around in my head for a while.
And if you use an NVIDIA card, the question can be answered pretty fast. Just use the CLI tool nvidia-smi:


Screenshot: nvidia-smi

So with the current ETH DAG #131 you can comfortably mine until April 2018, when the expected DAG size would exceed the RAM of a 3GB GPU.

It’s expected that the ETH network will switch to a “proof of stake” algorithm by then, more here. So don’t waste your money on 8GB cards.

Tales from the crypt: Neutron metadata issues

I’m operating OpenStack since 2014 and have come across a significant number of issues, mainly around Neutron; Which make sense, knowing the importance of Neutron inside OpenStack and without proper function all your workload has no access to the network.

This particular situation we are looking at was reported as a performance issue for the Neutron metadata service, in a Neutron Linux bridge ML2 managed environment.

The Neutron metadata service implements a proxy in between the OpenStack instance and the Nova and Neutron services to provide Amazon AWS EC2 style metadata.
This Neutron service is important for user instances for various reasons including:
• Cloud Placement Decisions (What is my public IP etc)
• User Scripts and SSH Key injection into the boot process (typically via cloud-init)

Performance issues, resulting in client timeouts or service unavailability of this service directly impacted cloud user workload, which led to application unavailability. The issue was compounded by operating over 1000 instances inside one layer 2 network.

The issue was further more compounded by operating over 1000 instances inside one Neutron layer 2 network.
The way Neutron provides this service is by wrapping into a Linux network namespaces and running a HTTP proxy server, the neutron-ns-metadata-proxy. Network namespace are common practice to separate routing domains in Linux, allowing custom firewall (iptables) and routing processing compared to the host OS. Additionally, the service scales per Neutron L2 network, a crucial information moving forward.

What happened to this service?

A Rackspace Private Cloud OpenStack customer was reporting response times larger than 30 seconds for any request to the Neutron metadata service. While initial debugging on the user instances revealed that metadata requests got intercepted by a security appliance, excluding the standard metadata IP, 169.254.169.254 from the proxy configuration via

export no_proxy="localhost,127.0.0.1,localaddress,.localdomain.com,169.254.169.254"

did not solve the issue. At this point I knew the issue was related to the Neutron service or the background service it uses, mainly Nova API (compute) and RabbitMQ (the OpenStack message bus).
Looking at the request the Neutron service handles, I identified an unusual pattern in the frequency and realized that the configuration management Chef was requesting the metadata, beyond the standard expected behavior if OpenStack instances boot/reboot.
From previous issues I knew that the Chef plugin ohai played a major role and inefficiencies were known in regards to HTTP connection handling, mainly the lack of supporting HTTP persistence.
Continuing the research on the Neutron service and looking for ways to improve response times, I identified that the neutron-ns-metadata-proxy service was only capable of opening 100 Unix sockets to the neutron-metadata-agent. These sockets are used to talk to the Neutron metadata-agent across the Linux Network namespace, without opening additional TCP connections internally, mainly as performance optimization.

Unable to explain the 100 connections limit at first, especially in absence of Neutron backend problems (Neutron server) or Nova API issues, I began looking at the neutron source code and found a related change in the upstream code.
The Neutron commit was adding an option to parameterize the WSGI threads, WSGI is used as web server gateway for Python, but also lowering the default limit from 1000 to 100. This crucial information was absent in any Neutron release notes.

More importantly, we just found our 100 Unix sockets limit

This also explained the second observation that the connections to the Neutron metadata service got queued and caused the large delay in response times. This queueing was a result of using a network event library eventlet and greenlet combination, a typical way of addressing non-blocking I/O in the Python environment.

So what comes next?

Currently I am looking to solve the problem in multiple ways.
The imminent problem should be solved with a Chef-ohai plugin fix as proposed per Chef pull request #995 which finally introduces persistent HTTP connections and drastically reducing the need for parallel connection. First results are encouraging.

More importantly the Neutron community has re-implemented the neutron-ns-metadata-proxy with HAProxy (LP #1524916) to address performance issues. The OpenStack community needs verify if the issue is still occurring.

Alternatively, there are Neutron network design decisions that can assist with these problems. For example, one approach is to reduce the size of a Neutron L2 network to smaller than 23 bits, which allows Neutron to scale out the metadata service.

This approach allows the option to create multiple Neutron routers, scaling out the Neutron metadata service onto other Neutron agents, where one router is only responsible for serving the Neutron Metadata requests. This is especially the situation when the configuration option enable_isolated_metadata is set to True and project/tenant networks are attached to Neutron routers.

So as usual, Neutron keeps it interesting for us. Can’t wait to dissect Neutron Metadata service in a DVR environment. More to come …..

What’s up with OpenStack Swift metadata

The other day I got interested in what attributes OpenStack Swift is actually storing along with a data.

First I had to determine the actual partition where Swift is storing the data. In my case I had a Swift ring prepared with a replication count of 2 so the data can only exist in two partitions.

The easy way to lookup this information is by using the swift-get-nodes:

swift-get-nodes <ring file> <URL containing account+container+path>
# swift-get-nodes /etc/swift/object-1.ring.gz /AUTH_e1496568b6864cb1b52cdfe7436c213f/test/root/hummingbird |grep lah

ssh 172.29.244.100 "ls -lah ${DEVICE:-/srv/node*}/swift4.img/objects-1/39/83a/27ea485e7f147e5e47f9c38dd0feb83a"

ssh 172.29.244.100 "ls -lah ${DEVICE:-/srv/node*}/swift5.img/objects-1/39/83a/27ea485e7f147e5e47f9c38dd0feb83a"

Lets change into the partition and retrieve all xattr keys and values:

# cd /srv/swift4.img/objects-1/39/83a/27ea485e7f147e5e47f9c38dd0feb83a/
# getfattr -dm '.*' 1469228674.06028.data

# file: 1469228674.06028.data
 user.swift.metadata="�}q(UC⎺┼├e┼├↑Le┼±├▒─12321▮41U┼▒└e─U>>"

Using the key name user.swift.metadata, I found out that the value for this key is a Python pickle object: https://github.com/openstack/swift/blob/master/swift/obj/diskfile.py#L133

Now let’s uncover the data of the pickle object:

Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.


>>> import xattr
>>> import pickle

>>> fh = open('1469228674.06028.data')

>>> xattr.listxattr(fh)
 (u'user.swift.metadata',)

>>> xattr.getxattr(fh,'user.swift.metadata')
 '\x80\x02}q\x01(U\x0eContent-Lengthq\x02U\x0812321041U\x04nameq\x03U>> 

>>> p = pickle.loads(xattr.getxattr(fh,'user.swift.metadata'))

>>> print p
 {'Content-Length': '12321041', 'name': '/AUTH_e1496568b6864cb1b52cdfe7436c213f/test/root/hummingbird', 'Content-Type': 'application/octet-stream', 'ETag': 'd645ab07a4a452abeeb7f3ad0ec0f7db', 'X-Timestamp': '1469228674.06028', 'X-Object-Meta-Mtime': '1466738039.627804'}

Here it is, the usual suspects are stored. This metadata is actually returned with each stat request. Quite clever, that way Swift does not need rehash or read additional attributes per file it serves.

Small excursion to the undocumented OpenStack LBaaSv2 world and HAProxy

Some people, including me, like to play with new stuff. And recently I set my mind to explore LBaaSv2 with the HAProxy namespace driver under RDO the RedHat Open Source distribution for OpenStack.

Here is what I did to get the Neutron LBaasV2 agent including the HAProxy driver working.

The configuration

  • Install necessary packages
yum upgrade
yum -y install openstack-neutron-lbaas haproxy

 

  • Enabling the LoadBalancerPluginv2 inside the /etc/neutron/neutron.conf
crudini --set /etc/neutron/neutron.conf DEFAULT service_plugins router,neutron_lbaas.services.loadbalancer.plugin.LoadBalancerPluginv2
  • Enabling the HAProxy namespace driver inside the/etc/neutron/neutron_lbaas.conf file
crudini --set /etc/neutron/neutron_lbaas.conf service_providers service_provider LOADBALANCERV2:Haproxy:neutron_lbaas.drivers.haproxy.plugin_driver.HaproxyOnHostPluginDriver:default
  • Configure OVS as interface drive inside the /etc/neutron/lbaas_agent.ini file

Interestingly RedHat did not pre configure the interface driver to OVS, knowing that it comes by default with OVS enabled as Neutron plugin.

crudini --set /etc/neutron/lbaas_agent.ini DEFAULT interface_driver neutron.agent.linux.interface.OVSInterfaceDriver
  • Add necessary database tables to the neutron database
neutron-db-manage --service lbaas --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head
  • Restart services
service neutron-server restart
service neutron-lbaasv2-agent restart

Testing & Creating a neutron load balancer

If all goes well, you will end up with loaded a Loadbalancerv2 agent

# source ~/keystonerc_admin ; neutron agent-list --fields agent_type --fields alive
+----------------------+-------+
| agent_type           | alive |
+----------------------+-------+
| Open vSwitch agent   | :-)   |
| Metadata agent       | :-)   |
| DHCP agent           | :-)   |
| Loadbalancerv2 agent | :-)   |
| L3 agent             | :-)   |
+----------------------+-------+

Now let’s create a load balancer since the existing privet (sub)network

 

neutron lbaas-loadbalancer-create private_subnet
 
Created a new loadbalancer:
+---------------------+----------------------+
| Field               | Value                |
+---------------------+----------------------+
| admin_state_up      | True                 |
| description         |                      |
| id                  | **id omitted**       |
| listeners           |                      |
| name                |                      |
| operating_status    | ONLINE               |
| provider            | haproxy              |
| provisioning_status | ACTIVE               |
| tenant_id           | **id omitted**       |
| vip_address         | 10.0.0.3             |
| vip_port_id         | **id omitted**       |
| vip_subnet_id       | **id omitted**       |
+---------------------+----------------------+

I did not assign a name to the load balancer, so all subsequent commands will reference the ID c92fb015-c766-4a26-a9f2-39f03aad20e8.

neutron lbaas-listener-create --loadbalancer <lb id> --protocol HTTP --protocol-port 80
Created a new listener:
+---------------------------+----------------+
| Field                     | Value          |
+---------------------------+----------------+
| admin_state_up            | True           |
| connection_limit          | -1             |
| default_pool_id           |                |
| default_tls_container_ref |                |
| description               |                |
| id                        | **id omitted** |
| loadbalancers             |                |
| name                      |                |
| protocol                  |                |
| protocol_port             |                |
| sni_container_refs        |                |
| tenant_id                 | **id omitted** |
+---------------------------+----------------+

It’s alive

neutron lbaas-loadbalancer-show <lb id>
+---------------------+---------------------+
| Field               | Value               |
+---------------------+---------------------+
| admin_state_up      | True                |
| description         |                     |
| id                  | **id omitted**      |
| listeners           |                     |
| name                |                     |
| operating_status    | ONLINE              |
| provider            | haproxy             |
| provisioning_status | ACTIVE              |
| tenant_id           | abc                 |
| vip_address         | 10.0.0.3            |
| vip_port_id         | ID                  |
| vip_subnet_id       | **id omitted**      |
+---------------------+---------------------+

Let’s just have a look inside the qlbaas namespace and see if the haproxy process is actually running

# ip netns |grep lbaas
qlbaas-c92fb015-c766-4a26-a9f2-39f03aad20e8
 
# ip netns exec qlbaas-c92fb015-c766-4a26-a9f2-39f03aad20e8 netstat -ntlp
 
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 10.0.0.3:80             0.0.0.0:*               LISTEN      14017/haproxy

 

For those who are curious how the haproxy has been configured, just look at the The haproxy configuration is stored at the /var/lib/neutron/lbaas/v2/c92fb015-c766-4a26-a9f2-39f03aad20e8/haproxy.conf file, where c92fb015-c766-4a26-a9f2-39f03aad20e8 resembles the load balancer ID

Multi Homing Debian/Ubuntu instances

As some CentOS/RedHat folks might know you can use the GATEWAYDEV option inside the networks configuration file to accept the network default gateway only from this interface. (GATEWAYDEV=eth0 for example). This is particular useful when connecting instances to multiple networks like a public and internal network to eliminate unnecessary routing while using DHCP to assign the network addresses to the interface. The need is primarily given when both networks, public and internal push DHCP default router (gateway) information to allow multi homed and single homed instance in the same networks.

One way of implementing a similar feature like RedHat or the derivatives is to utilize DHCP client hooks to alter the DHCP options pushed from the server to the client. The DHCP client program does support enter an exit hooks, which allows for alteration pre and post interface configuration.

For this use case I implemented the following enter hook to ignore the default router information on all interfaces other than the elected one. Personally I would recommend that the public facing interface is always the first one and the Internet default gateway is using this interface. Currently all dhclient enter hooks are stored inside the /etc/dhcp/dhclient-enter-hooks.d directory and are executed in alphanumeric order. This is also the reason why I prefixed the script with the number 1.

RUN='yes'
 
if [ $RUN = 'yes' ]; then
  if [ "$reason" = "BOUND" -o "$reason" = "REBOOT" ]; then
    if [ $interface != 'eth0' ]; then
      test -f /tmp/dhclient-script.debug && echo "Stripping default GW off $interface" |tee -a /tmp/dhclient-script.debug
      new_routers=""
      old_routers=""
    fi
  fi
fi

Windows drive letters and cinder volumes ??

How does Windows persist drive letters when attaching/detaching cinder volumes ?

Whenever filesystems are mounted inside Windows instances the administrator will usually assign a drive letter to the device. This information is made persistent inside the registry HKLM\SYSTEM\Mounted Devices registry subkey. Therefore the drive letter will be always persistent as long as you have the volume attached to the same instance. The device order does not matter in this case. My information was based on the quote I found at windowsitpro.com stating :

According to a Microsoft Customer Service and Support (CSS) representative, Windows uses the disk ID as an index to store and retrieve information about the disk. For example, in the HKLM\SYSTEM\Mounted Devices registry subkey, the disk ID appears as REG_BINARY data in the \DosDevices\ and \\??\Volume{} entries because Windows uses the disk ID to store and retrieve information about persistent drive letter mappings and mount points.

And what happens if you detach and attach cinder volumes in the wrong order ?

In short nothing, as long as the volumes are attached to the same system retaining the same copy of the registry. If ever the registry or the system changes, the device letter will be reordered. That’s also a good reason to choose your device description wisely in case you have to recover from a instance/OS issue.

Spice console issues with RHEL/CentOS 7 instances ?

After I deployed Openstack Icehouse we did notice spice html5 proxy console issues in particular with CentOS 7 and RHEL 7 guests. Those guest consoles did show issues with the character echoing, you where not able to see what you have typed inside the terminal. I did track this issue down to  a spice html5 proxy issue whenever the guest is using a frame buffer enable console. After I did disable the frame buffer mode and switched the console to a text mode, the guest console was finally usable. Here the instructions :

Please add inside the /etc/default/grub config file the option “nofb nomodeset” to the GRUB_CMDLINE_LINUX variable and regenerate the grub2 config.

  • Tested configuration /etc/default/grub :
GRUB_TIMEOUT=1
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL="serial console"
GRUB_SERIAL_COMMAND="serial"
GRUB_CMDLINE_LINUX="console=ttyS0 console=tty0 crashkernel=auto vconsole.keymap=us nofb nomodeset"
GRUB_DISABLE_RECOVERY="true"
  • Rebuilt grub2 config

grub2-mkconfig -o /boot/grub2/grub.cfg

After the mandatory instance reboot the console will boot in text mode only and not using any frame buffer graphic device. The console should work as desired at this point.

Why are the Nova Hypervisor statistic not updating after renaming a host while instance are running?

The nova-compute service is periodically updating hardware (VCPU, RAM, DISK) statistics for a host and is using the host name (check with hostname -f in Linux) to update the database with the available resources.

In cases where the host name has been changed while instances are running, all existing instances still reference the old host name inside the node column of the nova.instances table. All those entries need to be updated in order to get the correct amount of available resources for nova  inside the nova MySQL database:

UPDATE nova.instances SET node = '<new host name>' WHERE node = '<old host name>';

Other columns as host, launched_on should be included in a subsequent SQL.

Ever wondering why Windows Guests come up with the wrong time running inside Openstack ?

Openstack is starting the instances in UTC time when using kvm, the simulated guest hardware clock is always to UTC. This is independent from the host clock setting.

Windows OSes only assumes the hardware clock is set to local time so it boots up in UTC time until the time synchronization against the time.microsoft.com finishes and corrects it to the desired time zone.

To change the hardware clock in windows, you can add this registry entry :

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation]
"RealTimeIsUniversal"=dword:00000001

That will boot the instance with the correct time, since the hardware inside the guest and host are lining using the same time zone information.

Additionally please note, that Microsoft Windows Server 2008/7 had a High CPU issue when changing to DST.

That has been fixed with the hot fix :

2800213 High CPU usage during DST changeover in Windows Server 2008, Windows 7, or Windows Server 2008 R2

https://support.microsoft.com/en-us/kb/2687252

Update per 6/25/2015: This hot-fix is only applicable for older Openstack releases (Havana and lower). Newer releases of Openstack do start the guest inside the localtime zone, local to the host. Additionally Openstack needs to be aware that the image you’re using is a window guest and you have to set the os_type image property to windows. But beware for errors around those glance properties. There are known issues that the image properties are not retained if you create new images from existing nova instances.