Getting a Ratio Using a Single Iteration

So, recently I came across the question of how to calculate the ratio of elements of a stream to the total number of elements in said stream. The naïve solution would be:

collection.stream().filter(it -> it.completed()).count() / collection.size()

However, depending on where collection came from this might iterate it twice, producing unnecessary load while doing so. Also, we would need to cast one of the two values to a double otherwise your result would be 0 pretty much all the time.

This is very ugly.

One possible solution is to use a Collector to do the ratio calculation for us. For this we would need a Collector that keeps track of the total number of elements and the number of elements that are completed. Lucky for us there is such a Collector already: the averaging collector. If we map all completed elements to a 1 and all not-completed elements to a 0, the result of the average will match the ratio we are expecting:

collection.stream()
    .collect(Collectors.averagingInt(it -> it.completed() ? 1 : 0));

In Kotlin, there is an average function defined on Iterable<Int> so you can do something very similar:

collection.map { if (it.completed()) 1 else 0 }.average()

You could even combine that with an extension method and turn it into:

collection.map(Foo::toCompletion).average()
…
private fun Foo.toCompletion() = if (completed()) 1 else 0

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Bit Fiddlery

Recently I tried to parse FLAC headers. In the STREAMINFO block there are several fields that have a width of non-multiple-of-8 bits so I had to create a function that could read a number of bits starting at an arbitrary bit.

/**
 * Reads numberOfBits bits from the given buffer, starting at the given byte
 * and bit offsets. Bits are assumed to be numbered MSB-first, i.e. the
 * highest bit in a byte (0x80) is considered bit 0.
 * 
 * Example UUID: B24E931F-FFC5-4F3C-A6FF-E667BDB5F062
 */
long parseBits(byte[] data, int byteOffset, int bitOffset, int numberOfBits) {
  long value = 0;
  int currentByteOffset = byteOffset;
  int currentBitOffset = bitOffset;
  int bitsRemaining = numberOfBits;
 
  /* while we still need some bits... */
  while (bitsRemaining > 0) {
 
    /* shift the current value by the number of bits we still need
     * to make room for them at the end. at most a byte, though. */
    value <<= Math.min(8, remainingBits);
 
    /* extract all the bits remaining in the current byte. */
    int bitsWeNeed = (data[currentByteOffset] & (0xff >>> currentBitOffset));
 
    /* shift them so that only the number of bits we need remains. */
    bitsWeNeed >>= (8 - currentBitOffset - Math.min(bitsRemaining, 8 - currentBitOffset));
 
    /* now combine the values. */
    value |= bitsWeNeed;
 
    /* reduce number of bits we still need. */
    bitsRemaining -= Math.min(bitsRemaining, 8 - currentBitOffset);
 
    /* the current byte is now depleted of bits we need. even if it isn’t
     * it doesn’t matter because if we needed less bits than we had this
     * routine is now finished. */
    currentBitOffset = 0;
    currentByteOffset++;
  }
 
  return value;
}

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS

Regular Expressions: Greedy vs. Reluctant Quantifiers

The other day I wanted to extract a part of a filename and was quite dumbfounded when the extracted part came back empty. The regular expression used to extract the part using a capturing group was:

[A-Za-z0-9]*_?([0-9]*)\..*

The filenames fed into this expressing where mostly of the pattern someText_12345.data. However, occasionally a file named 12345.data would pass along and even though its name matched the expression, the capturing group came up empty. But… but… why?

Find out →

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS