Nibble Stew: December 2017

Sunday, December 31, 2017

These three things could improve the Linux development experience dramatically, #2 will surprise you

The development experience on a modern Linux system is fairly good, however there are several strange things, mostly due to legacy things no longer relevant, that cause weird bugs, hassles and other problems. Here are three suggestions for improvement:

1. Get rid of global state

There is a surprisingly large amount of global (mutable) state everywhere. There are also many places where said global state is altered in secret. As an example let's look at pkg-config files. If you have installed some package in a temporary location and request its linker flags with pkg-config --libs foo, you get out something like this:

-L/opt/lib -lfoo

The semantic meaning of these flags is "link against libfoo.so that is in /opt/lib". But that is not what these flags do. What they actually mean is "add /opt/lib to the global link library search path, then search for foo in all search paths". This has two problems. First of all, the linker might, or might not, use the library file in /opt/lib. Depending on other linker flags, it might find it somewhere else. But the bigger problem is that the -L option remains in effect after this. Any library search later might pick up libraries in /opt/lib that it should not have. Most of the time things work. Every now and then they break. This is what happens when you fiddle with global state.

The fix to this is fairly simple and requires only changing the pkg-config file generator so it outputs the following for --libs foo:

/opt/lib/libfoo.so

2. Get rid of -lm, -pthread et al

Back when C was first created, libc had very little functionality in it. Because of reasons, new functionality was added it went to its own library that you could then enable with a linker flag. Examples include -lm to add the math library and -ldl to get dlopen and friends. Similarly when threads appeared, each compiler had its own way of enabling them, and eventually any compiler not using -pthread died out.

If you look at the compiler flags in most projects there are a ton of gymnastics for adding all these flags not only to compiler flags but also to things like .pc files. And then there is code to take these flags out again when e.g. compiling on Visual Studio. And don't even get me started on related things like ltdl.

All of this is just pointless busywork. There is no reason all these could not be in libc proper and available and used always. It is unlikely that math libraries or threads are going to go away any time soon. In fact this has already been done in pretty much any library that is not glibc. VS has these by default, as does OSX, the BSDs and even alternative Linux libcs. The good thing is that Glibc maintainers are already in the process of doing this transition. Soon all of this pointless flag juggling will go away.

3. Get rid of 70s memory optimizations

Let's assume you are building an executable and that your project has two internal helper libraries. First you do this:

gcc -o myexe myexe.o lib1.a lib2.a

This gives you a linker error due to lib2 missing some symbols that are in lib1. To fix this you try:

gcc -o myexe myexe.o lib2.a lib1.a

But now you get missing symbols in lib1. The helper libraries have a circular dependency so you need to do this:

gcc -o myexe myexe.o lib1.a lib2.a lib1.a

Yes, you do need to define lib1 twice. The reason for this lies in the fact that in the 70s memory was limited. The linker goes through the libraries one by one. When it process a static library, it copies all symbols that are listed as missing and then throws away the rest. Thus if lib2 requires any symbol that myexe.o did not refer to, tough luck, all those symbols are gone. The only way to access them is to add lib1 to the linker line and have it processed in full for a second time.

This simple issue can be fixed by hand but things get more complicated if the come from external dependencies. The correct fix for this would be to change the linker to behave roughly like this:

Go through the entire linker line and find all libraries.
Look which point to same physical files and deduplicate them
Wrap all of these in a single -Wl,--start-group -Wl,--end-group
Do symbol lookup once in a global context

This is a fair bit of work and may cause some breakage. On the other hand we do know that this works because many linkers already do this, for example Visual Studio and LLVM's new lld linker.

Tuesday, December 26, 2017

Creating an USB image that boots to a single GUI app from scratch

Every now and than you might want or need to create a custom Linux install that boots from a USB stick, starts a single GUI application and keeps running that until the user turns off the power. As an example at a former workplace I created an application for downloading firmware images from an internal server and flashing those. The idea there was that even non-technical people could walk up to the computer, plug in their device via USB and push a button to get it flashed.

Creating your own image based on latest stable turns out to be relatively straightforward, though there are a few pitfalls. The steps are roughly the following:

Create a Debian boostrap install
Add dependencies of your program and things like X, Network Manager etc
Install your program
Configure the system to automatically login root on boot
Configure root to start X upon login (but only on virtual terminal 1)
Create an .xinitrc to start your application upon X startup

Information on creating a bootable Debian live image can easily be found on the Internet. Unfortunately information on setting up the boot process is not as easy to find, but is instead scattered all over the place. A lot of documentation still refers to the sysvinit way of doing things that won't work with systemd. Rather than try to write yet another blog post on the subject I instead created a script to do all that automatically. The code is available in this Github repo. It's roughly 250 lines of Python.

Using it is simple: insert a fresh USB stick in the machine and see what device name it is assigned to. Let's assume it is /dev/sdd. Then run the installer:

sudo ./createimage.py /dev/sdd

Once the process is complete and you can boot any computer with the USB stick to see this:

This may not look like much but the text in the top left corner is in fact a PyGTK program. The entire thing fits in a 226 MB squashfs image and takes only a few minutes to create from scratch. Expanding the program to have the functionality you want is then straightforward. The Debian base image takes care of all the difficult things like hardware autodetection, network configuration and so on.

Problems and points of improvement

The biggest problem is that when booted like this the mouse cursor is invisible. I don't know why. All I could find were other people asking about the same issue but no answers. If someone knows how to fix this, patches are welcome.

The setup causes the root user to autologin on all virtual terminals, not just #1.

If you need to run stuff like PulseAudio or any other thing that requires a full session, you'll probably need to install a full DE session and use its kiosk mode.

This setup runs as root. This may be good. It may be bad. It depends on your use case.

For more complex apps you'd probably want to create a DEB package and use it to install dependencies rather than hardcoding the list in the script as is done currently.

Saturday, December 23, 2017

"A simple makefile" is a unicorn

Whenever there is a discussion online about the tools to build software, there is always That One Person that shows up and claims that all build tools are useless bloated junk and that you should "just write a simple Makefile" because that is lean, efficient, portable and does everything anyone could ever want.

Like every sentence that has the word "just", this is at best horribly simplistic but mostly plain wrong. Let's dive in more detail into this. If you look up simple Makefiles on the Internet, you might find something like this page. It starts with a very simple (but useless) Makefile and eventually improves it to this:

IDIR =../include
CC=gcc
CFLAGS=-I$(IDIR)

ODIR=obj
LDIR =../lib

LIBS=-lm

_DEPS = hellomake.h
DEPS = $(patsubst %,$(IDIR)/%,$(_DEPS))

_OBJ = hellomake.o hellofunc.o
OBJ = $(patsubst %,$(ODIR)/%,$(_OBJ))

$(ODIR)/%.o: %.c $(DEPS)
$(CC) -c -o $@ $< $(CFLAGS)

hellomake: $(OBJ)
gcc -o $@ $^ $(CFLAGS) $(LIBS)

.PHONY: clean

clean:
rm -f $(ODIR)/*.o *~ core $(INCDIR)/*~

Calling this "simple" is a bit of a stretch. This snippet contains four different kinds of magic expansion variables, calls three external commands (two of which are gcc, just with different ways) and one Make's internal command (bonus question: is patsubst a GNU extension or is it available in BSD Make? what about NMake?) and requires the understanding of shell syntax. It is arguable whether this could be called "simple", especially for newcomers. But even so, this is completely broken and unreliable.

As an example, if you change any header files used by the sources, the system will not rebuild the targets. To fix these issues you need to write more Make. Maybe something like this example, described as A Super-Simple Makefile for Medium-Sized C/C++ Projects:

TARGET_EXEC ?= a.out

BUILD_DIR ?= ./build
SRC_DIRS ?= ./src

SRCS := $(shell find $(SRC_DIRS) -name *.cpp -or -name *.c -or -name *.s)
OBJS := $(SRCS:%=$(BUILD_DIR)/%.o)
DEPS := $(OBJS:.o=.d)

INC_DIRS := $(shell find $(SRC_DIRS) -type d)
INC_FLAGS := $(addprefix -I,$(INC_DIRS))

CPPFLAGS ?= $(INC_FLAGS) -MMD -MP

$(BUILD_DIR)/$(TARGET_EXEC): $(OBJS)
$(CC) $(OBJS) -o $@ $(LDFLAGS)

# assembly
$(BUILD_DIR)/%.s.o: %.s
$(MKDIR_P) $(dir $@)
$(AS) $(ASFLAGS) -c $< -o $@

# c source
$(BUILD_DIR)/%.c.o: %.c
$(MKDIR_P) $(dir $@)
$(CC) $(CPPFLAGS) $(CFLAGS) -c $< -o $@

# c++ source
$(BUILD_DIR)/%.cpp.o: %.cpp
$(MKDIR_P) $(dir $@)
$(CXX) $(CPPFLAGS) $(CXXFLAGS) -c $< -o $@

.PHONY: clean

clean:
$(RM) -r $(BUILD_DIR)

-include $(DEPS)

MKDIR_P ?= mkdir -p

It's unclear what the appropriate word to describe this thing is, but simple would not be at the top of the list for many people.

Even this improved version is broken and unreliable. The biggest issue is that changing compiler flags does not cause a recompile, only timestamps do. This is a common reason for silent build failures. It also does not provide for any way to configure the build depending on the OS in use. Other missing pieces that should be considered entry level features for build systems include:

No support for multiple build types (debug, optimized), changing build settings requires editing the Makefile
Output directory is hardcoded, you can't have many build directories with different setups
No install support
Does not work with Visual Studio
No unit testing support
No support for sanitizers apart from manually adding compiler arguments
No support for building shared libraries, apart from manually adding compiler arguments (remember to add -shared in your object file compile args ... or was it on link args ... or was it -fPIC)
No support for building static libraries at all
And so on and so on

As an example of a slightly more advanced feature, cross compilation is not supported at all.

These are all things you can add to this supposedly super simple Makefile, but the result will be a multi-hundred (thousand?) line monster of non-simplicityness.

Conclusions

Simple makefiles are a unicorn. A myth. They are figments of imagination that have not existed, do not exist and will never exist. Every single case of a supposedly simple Makefile has turned out to be a mule with a carrot glued to its forehead. The time has come to let this myth finally die.

Thursday, December 7, 2017

Comparing C, C++ and D performance with a real world project

Some time ago I wrote a blog post comparing the real world performance of C and C++ by converting Pkg-config from C to C++ and measuring the resulting binaries. This time we ported it to D and running the same tests.

Some caveats

I got comments that the C++ port was not "idiomatic C++". This is a valid argument but also kind of the point of the test. It aimed to test the behavior of ported code, not greenfield rewrites. This D version is even more unidiomatic, mostly because this is the first non-trivial D project I have ever done. An experienced D developer could probably do many of the things much better than what is there currently. In fact, there are parts of the code I would do differently based solely on the things I learned as the project progressed.

The code is available in this Github repo. If you wish to use something else than GDC, you probably need to tweak the compiler flags a bit. It also does not pass the full test suite. Once the code was in good enough condition to pass the Gtk+ test needed to get the results on this post, motivation to keep working on it dropped a fair bit.

The results

The result array is the same as in the original post, but the values for C++ using stdlibc++ have been replaced with corresponding measurements from GDC.

GDC C++ libc++ C

Optimized exe size 364kB 153kB 47kB

minsize exe size 452kB 141kB 43kB

3rd party dep size 0 0 1.5MB

compile time 3.9s 3.3s 0.1s

run time 0.10s 0.005s 0.004s

lines of code 3249 3385 3388

memory allocations 151 8571 5549

Explicit deallocation calls 0 0 79

memory leaks 7 0 >1000

peak memory consumption 48.8kB 53kB 56kB

Here we see that code size is not D's strong suite. As an extra bit of strangeness the size optimized binary took noticeably more space than the regular one. Compile times are also unexpectedly long given that D is generally known for its fast compile times. During development GDC felt really snappy, though, printing error messages on invalid code almost immediately. This would indicate that the slowdown is coming from GDC's optimization and code generation passes.

The code base is the smallest of the three but not by a huge margin. D's execution time is the largest of the three but most of that is probably due to runtime setup costs, which are amplified in a small program like this.

Memory consumption is where things get interesting. D uses a garbage collector by default whereas C and C++ don't, requiring explicit deallocation either manually or with RAII instead. The difference is clear in the number of allocations done by each language. Both C and C++ have allocation counts in the thousands whereas D only does 151 of them. Even more amazingly it manages to beat the competition by using the least amount of memory of any of the tested languages.

Memory graphs

A massif graph for the C++ program looked like this:

This looks like a typical manual memory management graph with steadily increasing memory consumption until the program is finished with its task and shuts down. In comparison D looks like the following:

D's usage of a garbage collector is readily apparent here. It allocates a big chunk up front and keeps using it until the end of the program. In this particular case we see that the original chunk was big enough for the whole workload so it did not need to grow the size of the memory pool. The small jitter in memory consumption is probably due to things such as file IO and work memory needed by the runtime.

The conversion and D as a language

The original blog posted mentioned that converting the C program to C++ was straightforward because you could change things in very small steps (including individual items in structs) while keeping the entire test suite running for the entire time. The D conversion was the exact opposite.

It started from the C++ one and once the files were renamed to D, nothing worked until all of the code was proper D. This meant staring at compiler failure messages and fixing issues until they went away (which took several weeks of work every now and then when free time presented itself) and then fixing all of the bugs that were introduced by the fixes. A person proficient in D could probably have done the whole thing from scratch in a fraction of the time.

As a language D is a slightly weird experience. Parts of it are really nice such as the way it does arrays and dictionaries. Much of it feels like a fast, typed version of Python, but other things are less ergonomic. For example you can do if(item in con) for dictionaries but not for arrays (presumably due to the potential for O(n) iterations).

Perhaps the biggest stepping stone is the documentation. There are nice beginner tutorials but intermediate level documentation seems to be scarce or possibly it's just hard to bing for. The reference documentation seems to be written by experts for other experts as tersely as possible. For comparison Python's reference documentation is both thorough and accessible in comparison. Similarly the IDE situation is unoptimal, as there are no IDEs in Ubuntu repositories and the Eclipse one I used was no longer maintained and fairly buggy (any programming environment that does not have one button go-to-definition and reliably working ctrl+space is DOA, sorry).

Overall though once you get D running it is nice. Do try it out if you haven't done so yet.