Categories
Debugging Java

Breaking JarURLConnection

Consider the following:

URL url = new URL("jar:file:test.jar!file.txt");
String one = readStringFromUrl(url);
String two = readStringFromUrl(url);
assertThat(one, equalTo(two));

This should totally be a green test, right? And if you’re using a current Java version (like 21, or 17, hell, even 11 will do), it is a green test.

With Java 8 and a special JAR file, however, it is not… which I found out going the other way, i.e. with a test that failed with Java 11 when it was fine with Java 8.

This test created a JAR file with two files in it; both had the same name but different content. This may or may not be technically valid but it is possible to do that, by convincing Java’s JarFile (using evil reflection magic) that it has never seen the name of the file after writing it once. This specially-crafted JAR file was then handed into a ClassLoader, which was in turn used with a ServiceLoader. With Java 8, it then returned two objects, of two different classes, as expected.

(In hindsight, that expectation was… weird, to say the least, as I knew that the ClassLoader method used by the ServiceLoader returned URL objects, and I knew that the URLs it returned would be identical; why I thought they could ever yield different data, I’m not entirely sure.)

With Java 11, two objects were returned but they were both of the same class. At first I suspected that the JAR was somehow written incorrectly but could quickly verify that it was indeed very much written as intended, just like before.

The next step was to dive right into the ServiceLoader. It uses a lot of different iterators to create the iterator it returns but I finally managed to find the place where the URLs returned by the class loader were being opened, read, and parsed. The logic here changed a bit between Java 8 and Java 11 but I even after running it dozens of times in the debugger and staring at it for prolonged amounts of time I did not achieve any more clarity.

Okay, so, what about the code that reads data from the URLs? That job is delegated all the way down to a JarURLConnection (of which there are two!), and lo and behold, apparently it’s using a cache! A comparison between Java 8 and Java 11 showed that this particular code didn’t really change, though, so that cannot be the reason for the observed differences in behaviour, either.

So I dug deeper into how the URL connections were actually made, and here I finally struck gold: The implementation of ZipFile.getEntry() changed drastically between Java 8 and Java 11, from a native implementation in Java 8 to a Java-based solution. The latter uses a hash table based on the hash of an entry’s name so requesting two entries with the same name (as I did in the test) would indeed return the same entry. I am guessing that Java 8’s native implementation, when asked to locate an entry, actually locates the next entry with the given name, i.e. it did not always start at the beginning of the ZIP file’s central directory. I can’t be bothered to locate the native source code to confirm that, though.

Categories
Uncategorized

Fixing a Git Repository with Broken Links

Just the other day one of my Git repositories developed problems and did things like this:

bombe@scandium:~/git/repo> git describe next
error: Could not read 706f6f1ff3dadccab7b037736d5ebf4eeadf7ccd
fatal: No tags can describe 'b39a41ddb264bbc673d731b81897583796657eca'.

git fsck reported (among other things removed for brevity):

broken link from commit 73ba74461ca2dd1da89f322aa87035f710fe4865
to commit 706f6f1ff3dadccab7b037736d5ebf4eeadf7ccd

Fortunately, even though this commit was quite recent it was already pushed to a remote repository so the correct data had to be there. But how do I get it back?

I tried naïvely to simply remove the branch and refetch it from the remote repository but that didn’t change anything. Of course it didn’t, the broken objects would still be in Git’s object store, they are referenced from the reflog as well, and I didn’t try to fiddle around with the settings for git gc to get it to remove objects more recent than two weeks.

Browsing the list of all available commands I stumbled upon pack-objects and unpack-objects which according to their respective man pages would do what I need. I cloned the remote repository next to the broken repository and started my rescue operation, after creating a copy of my broken repository:

bombe@scandium:~/git/repo> (cd ../repo2; echo 706f6f1ff3dadccab7b037736d5ebf4eeadf7ccd | git pack-objects --stdout) | git unpack-objects
Enumerating objects: 1, done.
Counting objects: 100% (1/1), done.
Total 1 (delta 0), reused 1 (delta 0), pack-reused 0
Unpacking objects: 100% (1/1), 882 bytes | 441.00 KiB/s, done.

Well, that looked nice. Did it work? What does git fsck say?

broken link from commit 706f6f1ff3dadccab7b037736d5ebf4eeadf7ccd
to commit 0f2af3a9ceede2efed3f5a477aba1cbd7fe7f5c0

Well, that’s a success! The formerly missing/broken commit was fixed but now pointed in turn to another missing or broken commit. In my case I had to repeat above procedure a small number of times but finally git fsck was not reporting broken links anymore and every other command once again performed like I expected it to.

Categories
JPA Spring-Data

Sorting by a Non-Entity Field

In my work for SceneSat I recently came across the need to sort entries of a table by a field that does not exist in said table but is built on the fly from other fields. Specifically, depending on the status of a show I want to use either the scheduled start time or the actual start time to sort it by.

Categories
java.time Kotlin

Finding a Date From a Year and an ISO Week Number

Once a year inevitably the time arrives when you need to print a new calender for your pinboard. You know, end of february. And this year I wanted try a new layout, focused on weeks.

I vaguely remember coding some calender generator that would output a simple SVG file but for the life of me I couldn’t find it. I knew I had to have had it last year because I printed last year’s calender with it but the source code was nowhere to be found, and no combination of search phrases could make it show up.

That meant I had to solve all those tiny little problems again because no matter the API, dates are simply a horrible concept and should be abolished in favour of something simpler.

One especially pesky problem I needed to solve was to find a date given a year and an ISO week number. The only solution I could find was rather ugly:

DateTimeFormatter.ISO_WEEK_DATE
  .parse("${year}-W${"%02d".format(week)}-1")

Even though this is an incredibly terrible solution I am giving this type of calender a try this year:

If you are interested in trying it as well, grab the PDF!

Categories
Kotlin Testing

Mocking Without Mockito

In recent years I have grown to like Mockito, everybody’s favourite and invaluable testing helper, less and less. Using a bytecode-twiddling framework to get your code to behave well in tests makes it hard to reason about the tests, and because Mockito (and other mocking frameworks) tend to use a lot of global state to configure the mocks, it’s possible to break test code by refactorings that on “normal” code are totally safe.

I have adopted a different approach for software I develop that does not use Mockito or any other mocking framework but relies on basic JVM features like interfaces and their implementations.

Categories
Java Kotlin Streams / Collections

Getting a Ratio Using a Single Iteration

Recently I came across the question of how to calculate the ratio of elements of a stream to the total number of elements in said stream. The naïve solution would be:

collection.stream()
  .filter(it -> it.completed())
  .count() / collection.size()

However, depending on where collection came from this might iterate it twice, producing unnecessary load or I/O while doing so. Also, we would need to cast one of the two values to a double otherwise your result would be 0 pretty much all the time.

This is very ugly.

One possible solution is to use a Collector to do the ratio calculation for us. For this we would need a Collector that keeps track of the total number of elements and the number of elements that are completed. Lucky for us there is such a Collector already: the averaging collector. If we map all completed elements to a 1 and all not-completed elements to a 0, the result of the average will match the ratio we are expecting:

collection.stream()
    .collect(Collectors.averagingInt(it -> it.completed() ? 1 : 0));

In Kotlin, there is an average function defined on Iterable<Int> so you can do something very similar:

collection.map { if (it.completed()) 1 else 0 }.average()

You could even combine that with an extension method and turn it into:

collection.map(Foo::toCompletion).average()
…
private fun Foo.toCompletion() =
  if (completed()) 1 else 0