Skip to main content
  1. Posts/

What Happens When You Press a Key

·1557 words·8 mins·
Table of Contents

Disclaimer: This article revolves around my experience with DWM and its interaction with the underlying X Window system (which is considered deprecated). You may not get any value out of this, unless you are using dwm or x11 like me. I wrote this because I enjoyed knowing about this and I hope you do too.

What happens when you press a key? (in X)

I had always wanted to try out tiling window managers. It was around Christmas, and I was on my holidays and I had some free time. I installed dwm and had successfully launched a session in it in less than 20 mins. Yayy!

I was about to configure it. The main configurations available can be classified as appearance-related and keybinding-related.

Appearance wise, tiling WMs split the screen into two sections – main area and the stack area, so that we have a side-by-side view of multiple open windows. (My steps towards becoming a 10x engineer 😜)

Keybinding wise, dwm has some hotkeys to manage the windows on the screen, their sizes, quitting dwm, etc. This is where I felt a roadblock because I didn’t understand the code that enabled these keybindings. (In my defense, the code is quite complex and it’s been several years since I last saw C programs 👀). In this post, I have basic sample code with which we can understand what happens when we press a key – apart from the character appearing on the screen.

Introducing X Window System

As mentioned earlier, DWM is a tiling window manager. It is responsible for creating windows, maintaining the main workspace, and the stack space, organising windows in desktop spaces, and bringing them on the screen when we switch desktops.

It is however not responsible for managing inputs from keyboard or mouse. The X Window system (X11) does it.

X follows a client-server architecture. The client and server can run on separate machines. Several clients (a.k.a windows on the screen) can establish a connection to the server and subscribe for input events. In this context, DWM itself is a client to the X server, which creates several nested clients inside it and manages them.

Lets look at a small sample code that gives a basic idea of keybinding-related code in DWM

Sample X Client program

Pro tip: If you are also out of touch with C programs like me, You can skip to the headings below to make sense of what the code does easily

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// compile using gcc xclient.c -lX11 -o xclient
#include <stdio.h>
#include <string.h>
#include <X11/Xlib.h>
#include <X11/keysym.h>

Display *display;
Window window;
XEvent event;

void setup() {
    display = XOpenDisplay(NULL);

    int screen = DefaultScreen(display);

    window = XCreateSimpleWindow(display, RootWindow(display, screen), 
                                        10, 10, 200, 200, 1, 
                                        BlackPixel(display, screen), 
                                        WhitePixel(display, screen));
    XMapWindow(display, window);
}

void subscribe() {
    XSelectInput(display, window, KeyPressMask);
}

void run(XEvent event) {
  char buffer[32], *key_name;
  KeySym keysym;

  printf("%12s | %10s | %10s | %10s\n", "KeyPressed", "KeyCode", "KeySym", "Buffer");
  printf("-------------|------------|------------|-----------\n");

  while (1) {
    XNextEvent(display, &event);

    if (event.type == KeyPress) {
        int len = XLookupString(&event.xkey, buffer, sizeof(buffer), &keysym, NULL);
        key_name = XKeysymToString(keysym);
            
        if (len > 0) {
            buffer[len] = '\0';
        } else {
            strcpy(buffer, "N/A");
        }
        printf("%12s | %10d | %10lu | %10s\n", key_name, event.xkey.keycode, keysym, buffer);

        if (keysym == XK_Escape) break;
    }
  }
}

void cleanup() {
    XCloseDisplay(display);
}

int main () {
    setup();
    subscribe();

    run(event);

    cleanup();
    return 0;
}

There are 4 main parts in the client program.

setup

XOpenDisplay establishes a connection to the X server. We create a simple window inside the RootWindow (Think RootWindow as the entire monitor). We map the window to the display so that it becomes visible.

input listener

Our client program subscribes to all key press events that the server receives when the window is focused (selected).

run

The main loop of the program, which prints key information about the event.

cleanup

We close the connection

From this, it is obvious that the answer to our question lies in the run method. Lets zoom in a litte to get the full picture.

The Answer

We have 3 main library functions in the loop

XNextEvent

This gives the latest event from a display, and assigns it to the event variable that we pass in the method.

XLookupString

This takes in a key, and writes the printable character to buffer and keysym to keysym variable. It also factors in the context of the keypress.

XKeysymToString

This takes in a KeySym and returns a String (that best denotes the KeySym)

We are printing all the values for inspecting.

Now, lets run the progam

Below are the keys I pressed in sequence while running the program:
a
Left Shift
a (with Left Shift held from the previous key press)
1 on the Numpad (with Numlock toggled off)
Num Lock
1 on the numpad again
1 (next to the tilde (`~`) symbol)
Esc

Output:

KeyPressed KeyCode KeySym Buffer
a 38 97 a
Shift_L 50 65505 N/A
A 38 65 A
KP_End 87 65436 N/A
Num_Lock 77 65407 N/A
KP_1 87 65457 1
1 10 49 1
Escape 9 65307

We can see that the first and third row have the same keycodes.

KeySym however changes. (You should be having a 💡 next to your head by now)

We can see that the sixth and seventh row have same printable characters (buffers) although the keycodes and keysyms change.

Lets define these terms.

Keycode

Keycodes are representation of physical keys in keyboard and mouse. The physical keys are assigned a number during startup and they don’t change thereafter.

Keysym

Keysyms are logical numbers for keys. These are arrived at based on the context of the keypress – like the state of modifier keys (Shift, Numlock, CapsLock, etc). For the same physical key, there may be different keysymbols in different contexts

X maintains both these mappings and use them to compute the symbol to be emitted

Shell Utilities

X provides several CLI tools for us to interact with it. I have taken two commands to demonstrate the mappings and events we discussed in our program.

xmodmap

1
xmodmap -pk

There are 7 KeySyms per KeyCode; KeyCodes range from 8 to 255.

KeyCode     Keysym (Keysym) ...
Value       Value   (Name)  ...

    ... clipped many lines ...
 24         0x0071 (q)      0x0051 (Q)      0x0071 (q)      0x0051 (Q)
 25         0x0077 (w)      0x0057 (W)      0x0077 (w)      0x0057 (W)
 26         0x0065 (e)      0x0045 (E)      0x0065 (e)      0x0045 (E)
 27         0x0072 (r)      0x0052 (R)      0x0072 (r)      0x0052 (R)
 28         0x0074 (t)      0x0054 (T)      0x0074 (t)      0x0054 (T)
 29         0x0079 (y)      0x0059 (Y)      0x0079 (y)      0x0059 (Y)
 30         0x0075 (u)      0x0055 (U)      0x0075 (u)      0x0055 (U)
 31         0x0069 (i)      0x0049 (I)      0x0069 (i)      0x0049 (I)
 32         0x006f (o)      0x004f (O)      0x006f (o)      0x004f (O)
 33         0x0070 (p)      0x0050 (P)      0x0070 (p)      0x0050 (P)
 34         0x005b (bracketleft)    0x007b (braceleft)      0x005b (bracketleft)    0x007b (braceleft)
    ... clipped many lines ...

The left most column shows the keycode.

The second column shows the primary keysym (lowercase character) that will be printed when a key is pressed without any modifiers

The third column shows the keysym and the character that will be printed when a key is pressed with Shift modifier

The fourth column shows the keysym and the character that will be printed when key is pressed with Mode_Switch (sometimes mapped to AltGr key). Some keyboards can emit more than 2 characters with the same key

The fifth column shows Mode_Switch + Shift and so on..

We can also use the above utility to change the keybindings on startup

For instance, we can run the below command to swap left and right clicks in a mouse (One way to impress our left-handed friends 😉)

1
xmodmap -e 'pointer = 3 2 1'

Xev

xev tool which prints all the input events happening in a particular window (like our C program).

1
xev -event mouse -event keyboard

Notice how the LeaveNotify, EnterNotify, MotionNotify events are shown for movements of the cursor inside the window. The Keypress and Keyrelease events can also be seen.

Seems cool, right? Thats how we could see all the input events tracked by X.

I hope this gives a fairly better idea about how X handles input events.

Moving forward

Coming back to our roadblock (or rather a little speedbump) we faced earlier in understanding DWM’s keybinding related code, DWM follows a somewhat similar approach.

DWM uses X only to intercept specific input events from the X Server. The only difference is, it uses XGrabKey and XGrabButton. These methods enable clients to tell X Server to always route certain key strokes to them (the client). This differs from XSelectInput which is a generic subscription for all events in the currently focused window. The XGrabKey can be used to define some hotkeys which is what DWM also uses them for. DWM defines these keybindings in the RootWindow itself, which enables it to manipulate the windows inside it, regardless of the current focused window.

Happy hacking!

References

  1. dwm
  2. This random gist
  3. Xorg
  4. Xlib
  5. And ofcource gemini
Reply by Email