JDK8源码阅读第一天–String

  
  // 我们先看第一个我们开发中常用的第一个方法   
  public String trim() {
        int len = value.length;
        int st = 0;
        char[] val = value;    /* avoid getfield opcode */

        while ((st < len) && (val[st] <= ' ')) {
            st++;
        }
        while ((st < len) && (val[len - 1] <= ' ')) {
            len--;
        }
        return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
    }

  // 紧接着上面最后一行代码,我们找到了substring()这个方法,这也是我们熟悉的
    public String substring(int beginIndex, int endIndex) {
        if (beginIndex < 0) {
            throw new StringIndexOutOfBoundsException(beginIndex);
        }
        if (endIndex > value.length) {
            throw new StringIndexOutOfBoundsException(endIndex);
        }
        int subLen = endIndex - beginIndex;
        if (subLen < 0) {
            throw new StringIndexOutOfBoundsException(subLen);
        }
        return ((beginIndex == 0) && (endIndex == value.length)) ? this
                : new String(value, beginIndex, subLen);
    }

    // 紧接着上面最后一行代码,我们找到了String的这个构造器方法
    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count <= 0) {
            if (count < 0) {
                throw new StringIndexOutOfBoundsException(count);
            }
            if (offset <= value.length) {
                this.value = "".value;
                return;
            }
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

    // 同样是最后一行代码我们找到了util包中的Arrays类,copyOfRange()这个方法
    public static char[] copyOfRange(char[] original, int from, int to) {
        int newLength = to - from;
        if (newLength < 0)
            throw new IllegalArgumentException(from + " > " + to);
        char[] copy = new char[newLength];
        System.arraycopy(original, from, copy, 0,
                         Math.min(original.length - from, newLength));
        return copy;
    }

    // 最后我们来到了System中的这个静态本地方法,到此我们是看不到具体实现了
    public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

我们调用的trim()实现的功能的确如它描述一般——Returns a string whose value is this string, with any leading and trailing whitespace removed.很简单,但是背后的实现,我们还是可以看到Java还是在背后帮我们做的不少工作,这个trim()方法,或者说这个trim API调用起来还是很简单的,比起刚学C语言那会,让我们实现这个的一个功能,需要想多少时间,写多少代码,我想这是作为拿来主义和实用主义最喜欢Java的一个重要原因吧。

接下来我们在看一组valueOf()方法的重载,我们目前只看看方法的签名

我们注意一点到没有形参为short和byte类型的方法,当然这一点也是容易理解的,因为在java中有一个数据类型的转换,byte -> short -> int,同时我们温习一下Java中的原始数据类型

如果我们挑选其中一个看一下,又讲师一番新的发现:比如就挑

    // 看看String是如何将long类型的原始数据转为String的
    public static String valueOf(long l) {
        return Long.toString(l);
    }

   // 我们自然来到Long这个类中
    public static String toString(long i) {
        if (i == Long.MIN_VALUE)
            return "-9223372036854775808";
        int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);
        char[] buf = new char[size];
        getChars(i, size, buf);
        return new String(buf, true);
    }

   // 上面吸引我们眼球的便是这个方法
    static void getChars(long i, int index, char[] buf) {
        long q;
        int r;
        int charPos = index;
        char sign = 0;

        if (i < 0) {
            sign = '-';
            i = -i;
        }

        // Get 2 digits/iteration using longs until quotient fits into an int
        while (i > Integer.MAX_VALUE) {
            q = i / 100;
            // really: r = i - (q * 100);
            r = (int)(i - ((q << 6) + (q << 5) + (q << 2)));
            i = q;
            buf[--charPos] = Integer.DigitOnes[r];
            buf[--charPos] = Integer.DigitTens[r];
        }

        // Get 2 digits/iteration using ints
        int q2;
        int i2 = (int)i;
        while (i2 >= 65536) {
            q2 = i2 / 100;
            // really: r = i2 - (q * 100);
            r = i2 - ((q2 << 6) + (q2 << 5) + (q2 << 2));
            i2 = q2;
            buf[--charPos] = Integer.DigitOnes[r];
            buf[--charPos] = Integer.DigitTens[r];
        }

        // Fall thru to fast mode for smaller numbers
        // assert(i2 <= 65536, i2);
        for (;;) {
            q2 = (i2 * 52429) >>> (16+3);
            r = i2 - ((q2 << 3) + (q2 << 1));  // r = i2-(q2*10) ...
            buf[--charPos] = Integer.digits[r];
            i2 = q2;
            if (i2 == 0) break;
        }
        if (sign != 0) {
            buf[--charPos] = sign;
        }
    }

显然我们在上面的getChars()方法中看到了“一顿操作”才实现了数字到字符串的操作,而且我们注意到上面的代码和我们开发中经常使用的显然有些不一样,还好我们从代码的注释中大致能知道这段代码在干什么,而“不需要”我们去理解这些代码。如果这段代码让一个java初学者去看,显然是一头雾水的,就算是有一定年限的Java开发者也不一定能看懂的,当然上面代码主要还是数据结构和算法。

接着我们继续看一个我们常用的方法toUpperCase(),这个方法功能描述也很简单——Converts all of the characters in this String to upper case using the rules of the default locale.我们来看看具体代码

public String toUpperCase() {
        return toUpperCase(Locale.getDefault());
}

// 具体调用的方法
public String toUpperCase(Locale locale) {
        if (locale == null) {
            throw new NullPointerException();
        }

        int firstLower;
        final int len = value.length;

        /* Now check if there are any characters that need to be changed. */
        scan: {
            for (firstLower = 0 ; firstLower < len; ) {
                int c = (int)value[firstLower];
                int srcCount;
                if ((c >= Character.MIN_HIGH_SURROGATE)
                        && (c <= Character.MAX_HIGH_SURROGATE)) {
                    c = codePointAt(firstLower);
                    srcCount = Character.charCount(c);
                } else {
                    srcCount = 1;
                }
                int upperCaseChar = Character.toUpperCaseEx(c);
                if ((upperCaseChar == Character.ERROR)
                        || (c != upperCaseChar)) {
                    break scan;
                }
                firstLower += srcCount;
            }
            return this;
        }

        /* result may grow, so i+resultOffset is the write location in result */
        int resultOffset = 0;
        char[] result = new char[len]; /* may grow */

        /* Just copy the first few upperCase characters. */
        System.arraycopy(value, 0, result, 0, firstLower);

        String lang = locale.getLanguage();
        boolean localeDependent =
                (lang == "tr" || lang == "az" || lang == "lt");
        char[] upperCharArray;
        int upperChar;
        int srcChar;
        int srcCount;
        for (int i = firstLower; i < len; i += srcCount) {
            srcChar = (int)value[i];
            if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
                (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
                srcChar = codePointAt(i);
                srcCount = Character.charCount(srcChar);
            } else {
                srcCount = 1;
            }
            if (localeDependent) {
                upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
            } else {
                upperChar = Character.toUpperCaseEx(srcChar);
            }
            if ((upperChar == Character.ERROR)
                    || (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
                if (upperChar == Character.ERROR) {
                    if (localeDependent) {
                        upperCharArray =
                                ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
                    } else {
                        upperCharArray = Character.toUpperCaseCharArray(srcChar);
                    }
                } else if (srcCount == 2) {
                    resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
                    continue;
                } else {
                    upperCharArray = Character.toChars(upperChar);
                }

                /* Grow result if needed */
                int mapLen = upperCharArray.length;
                if (mapLen > srcCount) {
                    char[] result2 = new char[result.length + mapLen - srcCount];
                    System.arraycopy(result, 0, result2, 0, i + resultOffset);
                    result = result2;
                }
                for (int x = 0; x < mapLen; ++x) {
                    result[i + resultOffset + x] = upperCharArray[x];
                }
                resultOffset += (mapLen - srcCount);
            } else {
                result[i + resultOffset] = (char)upperChar;
            }
        }
        return new String(result, 0, len + resultOffset);
    }

光看上面的代码量估计会吓者你,一个简单的功能背后竟然对应着这么多的代码.在冷静看一下,我们也是有所收获,首先是scan: {}代码块,然后可以想象得到,主要的操作还是调用Character这个类的,这也是情理之中,同时还有就是Locale,这个也是可以理解的,毕竟Locale类的说明就是——A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user. 它还很用心地举了一个例子:For example, displaying a number is a locale-sensitive operation— the number should be formatted according to the customs and conventions of the user’s native country, region, or culture.对应地还有toLowerCase().

接着我们再看一个方法,public boolean startsWith(String prefix),这个方法很好理解,但是我们再深入一点那就是怎么实现的问题。还是直接来看一下源代码。

    public boolean startsWith(String prefix) {
        return startsWith(prefix, 0);
    }

    public boolean startsWith(String prefix, int toffset) {
        char ta[] = value;
        int to = toffset;
        char pa[] = prefix.value;
        int po = 0;
        int pc = prefix.value.length;
        // Note: toffset might be near -1>>>1.
        if ((toffset < 0) || (toffset > value.length - pc)) {
            return false;
        }
        while (--pc >= 0) {
            if (ta[to++] != pa[po++]) {
                return false;
            }
        }
        return true;
    }

这段代码行数不是很多,而且看起来有点像C语言写的代码,因为都是算法,这段算法这完成它的使命——Tests if the substring of this string beginning at the specified index starts with the specified prefix.同理还有boolean endsWith(String suffix)。其实它内部调用还是startsWith(suffix, value.length – suffix.value.length)

如果上面的startsWith()用的不是很多的话,那我相信开发中我们会经常用到String[] split(String regex)——Splits this string around matches of the given regular expression.同时它还是给出了一个有趣的例子,仔细看了一下还是很有趣的,感觉自己没有真正掌握一样,那就是下面的o,吓得我赶紧试了一下

The string “boo:and:foo”, for example, yields the following results with these expressions:
Regex Result
: { “boo”, “and”, “foo” }
o { “b”, “”, “:and:f” }

还真是这样,果然还是要多看书!

在开发中我们还是会经常使用到String replace(CharSequence target, CharSequence replacement),同时这一系列的方法还有

在这一块我们还是直接先看一下源码:

public String replace(CharSequence target, CharSequence replacement) {
        return Pattern.compile(target.toString(),Pattern.LITERAL)
.matcher(this)
.replaceAll(Matcher.quoteReplacement(replacement.toString()));
    }

这里主要涉及到正则表达式相关的两个类Pattern和Matcher,还有一个接口CharSequence,显然我们的String是 CharSequence 的一个实现,所以我们经常传入的target和replacement都是字符串,我们瞄一眼String的申明

public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {

至于 Pattern 和Matcher这两个类,我们可以先简单看一下。

Pattern——A compiled representation of a regular expression.
A regular expression, specified as a string, must first be compiled into an instance of this class. The resulting pattern can then be used to create a Matcher object that can match arbitrary character sequences against the regular expression. All of the state involved in performing a match resides in the matcher, so many matchers can share the same pattern.
A typical invocation sequence is thus
Pattern p = Pattern.compile(“ab“);
  Matcher m = p.matcher(“aaaaab”);
boolean b = m.matches();
A matches method is defined by this class as a convenience for when a regular expression is used just once. This method compiles an expression and matches an input sequence against it in a single invocation. The statement
boolean b = Pattern.matches(“ab“, “aaaaab”);
is equivalent to the three statements above, though for repeated matches it is less efficient since it does not allow the compiled pattern to be reused.

顺便还提到了线程安全问题:
Instances of this class are immutable and are safe for use by multiple concurrent threads. Instances of the Matcher class are not safe for such use.

我想关于字符串和正则表达式的使用是很难分开讲的,这也是开发中常用到的知识点,还有就是 Pattern 类下面还是有很多解释常见的正则表达式规的,我们记住常用的几个就够了,知道怎么去查就行了。

自然看到了替换,还有就是字符串匹配boolean matches(String regex),这个方法的实现也是直接调用return Pattern.matches(regex, this);同系列我们可以列举一下:

到此我们再看几个我们不怎么用的方法,这个方法是1.5之后加入的:

😁 这个符号显然我们直接编码是没有的,但是使用 Unicode 编码是可以表示出来的,所以有了下面的测试代码及小结

  • 代码点&代码单元,是从Unicode标准而来的术语,Unicode标准的核心是一个编码字符集
  • 在java中, 一个字符, 仅仅代表一个代码点(codePoint),
    但却有可能代表多个代码单元(在java中就是两个字节, 一个char),比如表情符😝,只代表一个代码点, 却占着两个char

还是回到大中心——我们开发中常见的,indexOf()和lastIndexOf(),这一般和字符串截取方法一起使用。先看一下方法签名列表

如果我们挑选一个看看具体实现我们还是可以看到算法的身影:

    public int indexOf(int ch, int fromIndex) {
        final int max = value.length;
        if (fromIndex < 0) {
            fromIndex = 0;
        } else if (fromIndex >= max) {
            // Note: fromIndex might be near -1>>>1.
            return -1;
        }

        if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
            // handle most cases here (ch is a BMP code point or a
            // negative value (invalid code point))
            final char[] value = this.value;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == ch) {
                    return i;
                }
            }
            return -1;
        } else {
            return indexOfSupplementary(ch, fromIndex);
        }
    }

    private int indexOfSupplementary(int ch, int fromIndex) {
        if (Character.isValidCodePoint(ch)) {
            final char[] value = this.value;
            final char hi = Character.highSurrogate(ch);
            final char lo = Character.lowSurrogate(ch);
            final int max = value.length - 1;
            for (int i = fromIndex; i < max; i++) {
                if (value[i] == hi && value[i + 1] == lo) {
                    return i;
                }
            }
        }
        return -1;
    }

至此我们基本上把String类的方法看得差不多了,我们还是回到比较基础一点方法上:compareToIgnoreCase(String str), compareTo(StringanotherString),
int hashCode(), boolean equals(Object anObject) 。按照反方向依次看一下源码 吧。

    public boolean equals(Object anObject) {
        if (this == anObject) {
            return true;
        }
        if (anObject instanceof String) {
            String anotherString = (String)anObject;
            int n = value.length;
            if (n == anotherString.value.length) {
                char v1[] = value;
                char v2[] = anotherString.value;
                int i = 0;
                while (n-- != 0) {
                    if (v1[i] != v2[i])
                        return false;
                    i++;
                }
                return true;
            }
        }
        return false;
    }

    public int hashCode() {
        int h = hash;
        if (h == 0 && value.length > 0) {
            char val[] = value;

            for (int i = 0; i < value.length; i++) {
                h = 31 * h + val[i];
            }
            hash = h;
        }
        return h;
    }

    public int compareTo(String anotherString) {
        int len1 = value.length;
        int len2 = anotherString.value.length;
        int lim = Math.min(len1, len2);
        char v1[] = value;
        char v2[] = anotherString.value;

        int k = 0;
        while (k < lim) {
            char c1 = v1[k];
            char c2 = v2[k];
            if (c1 != c2) {
                return c1 - c2;
            }
            k++;
        }
        return len1 - len2;
    }

    public int compareToIgnoreCase(String str) {
        return CASE_INSENSITIVE_ORDER.compare(this, str);
    }

上面的代码还主要是算法,我们比较关注的是最后一个方法,让我们在深入一点。

    public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                        = newCaseInsensitiveComparator();
    private static class CaseInsensitiveComparator
            implements Comparator<String>, java.io.Serializable {
        // use serialVersionUID from JDK 1.2.2 for interoperability
        private static final long serialVersionUID = 8575799808933029326L;

        public int compare(String s1, String s2) {
            int n1 = s1.length();
            int n2 = s2.length();
            int min = Math.min(n1, n2);
            for (int i = 0; i < min; i++) {
                char c1 = s1.charAt(i);
                char c2 = s2.charAt(i);
                if (c1 != c2) {
                    c1 = Character.toUpperCase(c1);
                    c2 = Character.toUpperCase(c2);
                    if (c1 != c2) {
                        c1 = Character.toLowerCase(c1);
                        c2 = Character.toLowerCase(c2);
                        if (c1 != c2) {
                            // No overflow because of numeric promotion
                            return c1 - c2;
                        }
                    }
                }
            }
            return n1 - n2;
        }

        /** Replaces the de-serialized object. */
        private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
    }

这个私有静态的CASE_INSENSITIVE_ORDER还是比较有一点Java的意思的,和类算法的方法实现,至少我们看到了封装思想。

然后我们看一下几个构造器的方法签名:

其中基本上没有什么可以讲的,只要注意一下

public String(byte bytes[]) {
this(bytes, 0, bytes.length);
}


public byte[] getBytes() {
return StringCoding.encode(value, 0, value.length);
}

这个也是很容易理解的。这些方法一般在写数据转换的时候用的到。

至于String concat(String str) ,我们一般都是使用+来替代的,但是我记得在C语言写过这个实现。

还有一个方法我也是用的比较少,只是由于对字符串的格式的格式化这一块的需求没有那么多,我们呢还是可以关注一下这两个方法:

public static String format(String format, Object... args) {
return new Formatter().format(format, args).toString();
}

public static String format(Locale l, String format, Object... args) {
return new Formatter(l).format(format, args).toString();
}

最后我想说的是1.8引入的两个方法。

    public static String join(CharSequence delimiter,
            Iterable<? extends CharSequence> elements) {
        Objects.requireNonNull(delimiter);
        Objects.requireNonNull(elements);
        StringJoiner joiner = new StringJoiner(delimiter);
        for (CharSequence cs: elements) {
            joiner.add(cs);
        }
        return joiner.toString();
    }

    public static String join(CharSequence delimiter, CharSequence... elements) {
        Objects.requireNonNull(delimiter);
        Objects.requireNonNull(elements);
        // Number of elements not likely worth Arrays.stream overhead.
        StringJoiner joiner = new StringJoiner(delimiter);
        for (CharSequence cs: elements) {
            joiner.add(cs);
        }
        return joiner.toString();
    }

这个方法的引入,在开发中还是有不少好处的,因为我们经常需要对一个集合的字符串进行拼接工作,而且还是以特定的分隔符,以往的时候还是需要我们手动去掉最后一个多拼接的分隔符。当然核心功劳还是StringJoiner类。

好了,以前觉得很多的String源码基本上一起“过”了一遍。希望你我都有所收获。Have a good night ! : )

Leave a Reply