Getting a Ratio Using a Single Iteration

So, recently I came across the question of how to calculate the ratio of elements of a stream to the total number of elements in said stream. The naïve solution would be:

collection.stream().filter(it -> it.completed()).count() / collection.size()

However, depending on where collection came from this might iterate it twice, producing unnecessary load while doing so. Also, we would need to cast one of the two values to a double otherwise your result would be 0 pretty much all the time.

This is very ugly.

One possible solution is to use a Collector to do the ratio calculation for us. For this we would need a Collector that keeps track of the total number of elements and the number of elements that are completed. Lucky for us there is such a Collector already: the averaging collector. If we map all completed elements to a 1 and all not-completed elements to a 0, the result of the average will match the ratio we are expecting:

collection.stream()
    .collect(Collectors.averagingInt(it -> it.completed() ? 1 : 0));

In Kotlin, there is an average function defined on Iterable<Int> so you can do something very similar:

collection.map { if (it.completed()) 1 else 0 }.average()

You could even combine that with an extension method and turn it into:

collection.map(Foo::toCompletion).average()
…
private fun Foo.toCompletion() = if (completed()) 1 else 0

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Bit Fiddlery

Recently I tried to parse FLAC headers. In the STREAMINFO block there are several fields that have a width of non-multiple-of-8 bits so I had to create a function that could read a number of bits starting at an arbitrary bit.

/**
 * Reads numberOfBits bits from the given buffer, starting at the given byte
 * and bit offsets. Bits are assumed to be numbered MSB-first, i.e. the
 * highest bit in a byte (0x80) is considered bit 0.
 * 
 * Example UUID: B24E931F-FFC5-4F3C-A6FF-E667BDB5F062
 */
long parseBits(byte[] data, int byteOffset, int bitOffset, int numberOfBits) {
  long value = 0;
  int currentByteOffset = byteOffset;
  int currentBitOffset = bitOffset;
  int bitsRemaining = numberOfBits;
 
  /* while we still need some bits... */
  while (bitsRemaining > 0) {
 
    /* shift the current value by the number of bits we still need
     * to make room for them at the end. at most a byte, though. */
    value <<= Math.min(8, remainingBits);
 
    /* extract all the bits remaining in the current byte. */
    int bitsWeNeed = (data[currentByteOffset] & (0xff >>> currentBitOffset));
 
    /* shift them so that only the number of bits we need remains. */
    bitsWeNeed >>= (8 - currentBitOffset - Math.min(bitsRemaining, 8 - currentBitOffset));
 
    /* now combine the values. */
    value |= bitsWeNeed;
 
    /* reduce number of bits we still need. */
    bitsRemaining -= Math.min(bitsRemaining, 8 - currentBitOffset);
 
    /* the current byte is now depleted of bits we need. even if it isn’t
     * it doesn’t matter because if we needed less bits than we had this
     * routine is now finished. */
    currentBitOffset = 0;
    currentByteOffset++;
  }
 
  return value;
}

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Regular Expressions: Greedy vs. Reluctant Quantifiers

The other day I wanted to extract a part of a filename and was quite dumbfounded when the extracted part came back empty. The regular expression used to extract the part using a capturing group was:

[A-Za-z0-9]*_?([0-9]*)\..*

The filenames fed into this expressing where mostly of the pattern someText_12345.data. However, occasionally a file named 12345.data would pass along and even though its name matched the expression, the capturing group came up empty. But… but… why?

Find out →

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

This Method Has Been Called Before

Sometimes you have a method that should only be called once, and any further calls to this method are considered to be an error. And you thought you had made sure that this method is only called once. But every now and then you find something in your log file that points to the fact that this method has in fact been called twice. This is an outrage!

So, how do you track down those spurious calls that happen out of nowhere and that can not possibly happen at all? Turns out, it’s not that hard: I thought up this method after reading a bit of log file from a fellow Freenet user. For a single method call it contained two different stack traces, one from the current call, and one from the last call. So after a couple of seconds I came to the conclusion that it has to happen a little bit like this:

public class SomeClass {
	/* Example UUID: dc033fa4-0102-4051-af9a-df9441312192 */
	private Exception firstCallException;
	public void someMethod() {
		if (firstCallException != null) {
			throw new Exception("Method already called!",
				firstCallException);
		}
		/* do stuff here */
		firstCallException = new Exception();
	}
}

What happens here is quite simple, really. first­Call­Exception is initialized with null, so on the first execution of some­Method nothing spectacular happens but at the end of the method first­Call­Exception is set: a new Exception is created which also initializes a stack trace for the current thread. On the second call we realize that first­Call­Exception has already been set so this method must have been called before! We throw a new Exception which uses the exception of the first method call as root cause; this enables us to get information about both exceptions from the resulting exceptions. Also, this would allow us to chain an arbitrary number of exceptions, e.g. when you have to track all calls to a method.

(This method will fail in some way when accessed by multiple threads — I leave it as an exercise for the reader to make it thread-safe.)

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS