Categories
Debugging Java

Breaking JarURLConnection

Consider the following:

URL url = new URL("jar:file:test.jar!file.txt");
String one = readStringFromUrl(url);
String two = readStringFromUrl(url);
assertThat(one, equalTo(two));

This should totally be a green test, right? And if you’re using a current Java version (like 21, or 17, hell, even 11 will do), it is a green test.

With Java 8 and a special JAR file, however, it is not… which I found out going the other way, i.e. with a test that failed with Java 11 when it was fine with Java 8.

This test created a JAR file with two files in it; both had the same name but different content. This may or may not be technically valid but it is possible to do that, by convincing Java’s JarFile (using evil reflection magic) that it has never seen the name of the file after writing it once. This specially-crafted JAR file was then handed into a ClassLoader, which was in turn used with a ServiceLoader. With Java 8, it then returned two objects, of two different classes, as expected.

(In hindsight, that expectation was… weird, to say the least, as I knew that the ClassLoader method used by the ServiceLoader returned URL objects, and I knew that the URLs it returned would be identical; why I thought they could ever yield different data, I’m not entirely sure.)

With Java 11, two objects were returned but they were both of the same class. At first I suspected that the JAR was somehow written incorrectly but could quickly verify that it was indeed very much written as intended, just like before.

The next step was to dive right into the ServiceLoader. It uses a lot of different iterators to create the iterator it returns but I finally managed to find the place where the URLs returned by the class loader were being opened, read, and parsed. The logic here changed a bit between Java 8 and Java 11 but I even after running it dozens of times in the debugger and staring at it for prolonged amounts of time I did not achieve any more clarity.

Okay, so, what about the code that reads data from the URLs? That job is delegated all the way down to a JarURLConnection (of which there are two!), and lo and behold, apparently it’s using a cache! A comparison between Java 8 and Java 11 showed that this particular code didn’t really change, though, so that cannot be the reason for the observed differences in behaviour, either.

So I dug deeper into how the URL connections were actually made, and here I finally struck gold: The implementation of ZipFile.getEntry() changed drastically between Java 8 and Java 11, from a native implementation in Java 8 to a Java-based solution. The latter uses a hash table based on the hash of an entry’s name so requesting two entries with the same name (as I did in the test) would indeed return the same entry. I am guessing that Java 8’s native implementation, when asked to locate an entry, actually locates the next entry with the given name, i.e. it did not always start at the beginning of the ZIP file’s central directory. I can’t be bothered to locate the native source code to confirm that, though.

Categories
Java Kotlin Streams / Collections

Getting a Ratio Using a Single Iteration

Recently I came across the question of how to calculate the ratio of elements of a stream to the total number of elements in said stream. The naïve solution would be:

collection.stream()
  .filter(it -> it.completed())
  .count() / collection.size()

However, depending on where collection came from this might iterate it twice, producing unnecessary load or I/O while doing so. Also, we would need to cast one of the two values to a double otherwise your result would be 0 pretty much all the time.

This is very ugly.

One possible solution is to use a Collector to do the ratio calculation for us. For this we would need a Collector that keeps track of the total number of elements and the number of elements that are completed. Lucky for us there is such a Collector already: the averaging collector. If we map all completed elements to a 1 and all not-completed elements to a 0, the result of the average will match the ratio we are expecting:

collection.stream()
    .collect(Collectors.averagingInt(it -> it.completed() ? 1 : 0));

In Kotlin, there is an average function defined on Iterable<Int> so you can do something very similar:

collection.map { if (it.completed()) 1 else 0 }.average()

You could even combine that with an extension method and turn it into:

collection.map(Foo::toCompletion).average()
…
private fun Foo.toCompletion() =
  if (completed()) 1 else 0
Categories
Bit Manipulation Java

Bit Fiddlery

Recently I tried to parse FLAC headers. In the STREAMINFO block there are several fields that have a width of non-multiple-of-8 bits so I had to create a function that could read a number of bits starting at an arbitrary bit.

/**
 * Reads numberOfBits bits from the given buffer, starting at the given byte
 * and bit offsets. Bits are assumed to be numbered MSB-first, i.e. the
 * highest bit in a byte (0x80) is considered bit 0.
 * 
 * Example UUID: B24E931F-FFC5-4F3C-A6FF-E667BDB5F062
 */
long parseBits(byte[] data, int byteOffset, int bitOffset, int numberOfBits) {
  long value = 0;
  int currentByteOffset = byteOffset;
  int currentBitOffset = bitOffset;
  int bitsRemaining = numberOfBits;

  /* while we still need some bits... */
  while (bitsRemaining > 0) {

    /* shift the current value by the number of bits we still need
     * to make room for them at the end. at most a byte, though. */
    value <<= Math.min(8, remainingBits);

    /* extract all the bits remaining in the current byte. */
    int bitsWeNeed = (data[currentByteOffset] & (0xff >>> currentBitOffset));

    /* shift them so that only the number of bits we need remains. */
    bitsWeNeed <<= (8 - currentBitOffset - Math.min(bitsRemaining, 8 - currentBitOffset));

    /* now combine the values. */
    value |= bitsWeNeed;

    /* reduce number of bits we still need. */
    bitsRemaining -= Math.min(bitsRemaining, 8 - currentBitOffset);

    /* the current byte is now depleted of bits we need. even if it isn’t
     * it doesn’t matter because if we needed less bits than we had this
     * routine is now finished. */
    currentBitOffset = 0;
    currentByteOffset++;
  }

  return value;
}
Categories
Java

This Method Has Been Called Before

Sometimes you have a method that should only be called once, and any further calls to this method are considered to be an error. And you thought you had made sure that this method is only called once. But every now and then you find something in your log file that points to the fact that this method has in fact been called twice. This is an outrage!

So, how do you track down those spurious calls that happen out of nowhere and that can not possibly happen at all? Turns out, it’s not that hard: I thought up this method after reading a bit of log file from a fellow Freenet user. For a single method call it contained two different stack traces, one from the current call, and one from the last call. So after a couple of seconds I came to the conclusion that it has to happen a little bit like this:

public class SomeClass {
	/* Example UUID: dc033fa4-0102-4051-af9a-df9441312192 */
	private Exception firstCallException;
	public void someMethod() {
		if (firstCallException != null) {
			throw new Exception("Method already called!",
				firstCallException);
		}
		/* do stuff here */
		firstCallException = new Exception();
	}
}

What happens here is quite simple, really. first­Call­Exception is initialized with null, so on the first execution of some­Method nothing spectacular happens but at the end of the method first­Call­Exception is set: a new Exception is created which also initializes a stack trace for the current thread. On the second call we realize that first­Call­Exception has already been set so this method must have been called before! We throw a new Exception which uses the exception of the first method call as root cause; this enables us to get information about both exceptions from the resulting exceptions. Also, this would allow us to chain an arbitrary number of exceptions, e.g. when you have to track all calls to a method.

(This method will fail in some way when accessed by multiple threads — I leave it as an exercise for the reader to make it thread-safe.)