GNU/Linux container internals aka Cgroups and Namespaces

In this post, I will shed some light on the GNU/Linux container internals.Basically, what is underlying technology driving that. Here we go,without much ado:

What is GNU/Linux Container?

Is an operating system-level virtualization method for running multiple isolated GNU/Linux systems (containers) on a single control host (LXC host). It does not provide a virtual machine, but rather provides a virtual environment that has its own CPU, memory, block I/O, network, etc. space. This is provided by cgroups ( we will give details about it later) features in Linux kernel on LXC host. It is similar to a chroot, but offers much more isolation.
Before I give you the information about LXC , let me make you aware of the two crucial aspect of it ,namely cgroups and namespace.

Cgroups AKA Control Group:

It is a Linux kernel feature to limit, police and account the resource usage of certain processes (actually process groups).

  • Create and manage them on the fly using tools like cgcreate, cgexec, cgclassify etc
  • The “rules engine daemon”, to automatically move certain users/groups/commands to groups (/etc/cgrules.conf and /usr/lib/systemd/system/cgconfig.service)
  • Through other software such as (LXC) virtualization
  • (control groups) subsystem is a Resource Management solution providing a generic process-grouping framework
  • Cgroups provide resource management solution (handling groups)

For Cgroups implementation need a few simple hook into rest of the kernel,namely :

a) For each process :/proc/pid/cgroup

b) System-wide: /proc/cgroup
But we are lucky enough that, newer distribution running systemd comes along with all those tweak by default,so don’t sweat.

A little internals does not harm!! here we go :

First,cgroups use VFS(virtual file systems),all entries created in it ,are not persistent,means deleted on reboot.

Second, all cgroups actions are performed via file systems actions(create/remove directory,reading/writing to the files in it,mounting/mount options).

For example :

cgroup inode_operations for cgroup mkdir/rmdir.

cgroup file_system_type for cgroup mount/unmount.

cgroup file_operations for reading/writing to control files.

Systemd uses control groups only for the process grouping ;not for anything else like allocating resources like block io, bandwidth,etc.

It look something like this :

#subsys_name hierarchy    enabled

cpuset  8    1    1

cpu 3    2    1

cpuacct 3    2    1

blkio   4    2    1

memory  7    2    1

devices 2    41   1

freezer 5    1    1

net_cls 6    1    1

Below are few things you can do with cgroup,provided the library is installed:

Example:

cgcreate -g cpuset:/test

cgset -r cpuset.cpus=1 /test

cgset -r cpuset.mems=0 /test

cgexec -g

I have touched the tip of ice-burg ,if you are really interested to explore more , then you should follow the below mentioned link.

To use the effect of it ,you got to install libcgroup. The best place to know about cgroups is here and here and here  . Please read those mentioned link before to get a thorough understanding of cgroups.

Namespaces:

a)It is light weight process virtualization.

b) Isolation : enable a process or group of process to view the system in different perspective.

c)Much like zones in Solaris.

d)No hypervisor layer(as in OS virtualization like kvm and xen)

There are currently 6 namespaces,those are:

  • mnt(mount points and filesystems)
  • pid(processes)
  • net(network stack)
  • ipc(system v ipc)
  • uts(hostname)
  • user(UIDs)

Namespace first appear in Linux kernel 2.4.19,way back in 2002!!

** Each namespace has a unique inode number.

You need to know which config options are get effected ,while manipulating it(namespace). Here are those :

Kernel config items:

CONFIG_NAMESPACES

CONFIG_UTS_NS

CONFIG_IPC_NS

CONFIG_USER_NS

CONFIG_PID_NS

CONFIG_NET_NS

Each and every option doing the specific duty,as mentioned earlier. And in user space you have two package to play with it,those are :

iproute and util_linux 

Please explore those package in and the offering in detail to work with the above.Plus one has care about below findings:

How to find all existing namespace in GNU/Linux?

If you execute as root,you get the list of attached namespaces of the init process using PID=1.

In order to find other namespaces with attached processes in the system, we use these entries of the PID=1 as a reference. Any process or thread in the system, which has not the same namespace ID as PID=1 is not belonging to the DEFAULT namespace.

Additionally, you find the namespaces created by “ip netns add ” by default in /var/run/netns/ .

Okay, credit has to be given where it’s due……

Rami Rosen was kind enough to provide lots of information about those and most importantly share with public.Thanks Rami!  Here is his paper about it.

Check out this wonderful guide about it at LWN. Equally well written blog about it on opencloudblog and  here .

How docker use namespace,specifically about mount namespace.

Justin Weissig written a wonderful article about cgroups.

A must view place is kernel documentation about cgroups.

Ginny Henningsen and Lenz Grimmer written a magnificent blog at Oracle site.

Hope this will give you heads up.

Cheers!

About unixbhaskar
GNU/Linux Consultant

2 Responses to GNU/Linux container internals aka Cgroups and Namespaces

  1. Pingback: Links 31/8/2015: Linux 4.2, LXLE 14.04.3 | Techrights

  2. Rami Rosen says:

    >Okay, credit has to be given where it’s due……
    >Rami Rosen was kind enough to provide lots of information about those and most importantly >share with public.Thanks Rami! Here is his paper about it.

    My pleasure!
    Rami Rosen

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: