Get Substring from String in Java – 在Java中从字符串中获取子串

最后修改: 2019年 12月 21日

1. Overview


In this quick tutorial, we’ll focus on the substring functionality of Strings in Java.


We’ll mostly use the methods from the String class and few from Apache Commons’ StringUtils class.

我们将主要使用String类中的方法和Apache Commons的StringUtils类中的少数方法。

In all of the following examples, we’re going to using this simple String:


String text = "Julia Evans was born on 25-09-1984. "
  + "She is currently living in the USA (United States of America).";

2. Basics of substring


Let’s start with a very simple example here – extracting a substring with the start index:


assertEquals("USA (United States of America).", 

Note how we extracted Julia’s country of residence in our example here.


There’s also an option to specify an end index, but without it – substring will go all the way to the end of the String. 


Let’s do that and get rid of that extra dot at the end, in the example above:


assertEquals("USA (United States of America)", 
  text.substring(67, text.length() - 1));

In the examples above, we’ve used the exact position to extract the substring.


2.1. Getting a Substring Starting at a Specific Character


In case the position needs to be dynamically calculated based on a character or String we can make use of the indexOf method:


assertEquals("United States of America", 
  text.substring(text.indexOf('(') + 1, text.indexOf(')')));

A similar method that can help us locate our substring is lastIndexOf. Let’s use lastIndexOf to extract the year “1984”. Its the portion of text between the last dash and the first dot:

一个类似的方法可以帮助我们定位我们的子串,那就是lastIndexOf。让我们使用lastIndexOf来提取 “1984 “这个年份。它是最后一个破折号和第一个点之间的那部分文本。

  text.substring(text.lastIndexOf('-') + 1, text.indexOf('.')));

Both indexOf and lastIndexOf can take a character or a String as a parameter. Let’s extract the text “USA” and the rest of the text in the parenthesis:

indexOflastIndexOf都可以接受一个字符或一个String作为参数。让我们提取文本 “USA “和括号内的其他文本。

assertEquals("USA (United States of America)",
  text.substring(text.indexOf("USA"), text.indexOf(')') + 1));

3. Using subSequence


The String class provides another method called subSequence which acts similar to the substring method.


The only difference is that it returns a CharSequence instead of a String and it can only be used with a specific start and end index:


assertEquals("USA (United States of America)", 
  text.subSequence(67, text.length() - 1));

4. Using Regular Expressions


Regular expressions will come to our rescue if we have to extract a substring that matches a specific pattern.


In the example String, Julia’s date of birth is in the format “dd-mm-yyyy”. We can match this pattern using the Java regular expression API.

在这个例子中String, Julia的出生日期是 “dd-mm-yyyy “的格式。我们可以使用Java正则表达式API来匹配这个模式。

First of all, we need to create a pattern for “dd-mm-yyyy”:

首先,我们需要为 “dd-mm-yyy “创建一个模式。

Pattern pattern = Pattern.compile("\\d{2}-\\d{2}-\\d{4}");

Then, we’ll apply the pattern to find a match from the given text:


Matcher matcher = pattern.matcher(text);

Upon a successful match we can extract the matched String:


if (matcher.find()) {                                  

For more details on the Java regular expressions check out this tutorial.


5. Using split


We can use the split method from the String class to extract a substring. Say we want to extract the first sentence from the example String. This is quite easy to do using split:


String[] sentences = text.split("\\.");

Since the split method accepts a regex we had to escape the period character. Now the result is an array of 2 sentences.


We can use the first sentence (or iterate through the whole array):


assertEquals("Julia Evans was born on 25-09-1984", sentences[0]);

Please note that there are better ways for sentence detection and tokenization using Apache OpenNLP. Check out this tutorial to learn more about the OpenNLP API.

请注意,使用Apache OpenNLP进行句子检测和标记化有更好的方法。请查看这个教程以了解更多关于OpenNLP API的信息。

6. Using Scanner


We generally use Scanner to parse primitive types and Strings using regular expressions. A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace.


Let’s find out how to use this to get the first sentence from the example text:


try (Scanner scanner = new Scanner(text)) {
    assertEquals("Julia Evans was born on 25-09-1984",;    

In the above example, we have set the example String as the source for the scanner to use.


Then we are setting the period character as the delimiter (which needs to be escaped otherwise it will be treated as the special regular expression character in this context).


Finally, we assert the first token from this delimited output.


If required, we can iterate through the complete collection of tokens using a while loop.


while (scanner.hasNext()) {
   // do something with the tokens returned by

7. Maven Dependencies


We can go a bit further and use a useful utility – the StringUtils class – part of the Apache Commons Lang library:

我们可以更进一步,使用一个有用的工具–StringUtils类–Apache Commons Lang库的一部分。


You can find the latest version of this library here.


8. Using StringUtils


The Apache Commons libraries add some useful methods for manipulating core Java types. Apache Commons Lang provides a host of helper utilities for the java.lang API, most notably String manipulation methods.

Apache Commons库增加了一些有用的方法来操作核心Java类型。Apache Commons Lang为java.lang API提供了大量的辅助工具,最主要的是String操作方法。

In this example, we’re going to see how to extract a substring nested between two Strings:


assertEquals("United States of America", 
  StringUtils.substringBetween(text, "(", ")"));

There is a simplified version of this method in case the substring is nested in between two instances of the same String:


substringBetween(String str, String tag)

The substringAfter method from the same class gets the substring after the first occurrence of a separator.


The separator isn’t returned:


assertEquals("the USA (United States of America).", 
  StringUtils.substringAfter(text, "living in "));

Similarly, the substringBefore method gets the substring before the first occurrence of a separator.


The separator isn’t returned:


assertEquals("Julia Evans", 
  StringUtils.substringBefore(text, " was born"));

You can check out this tutorial to find out more about String processing using Apache Commons Lang API.

你可以查看这个教程,了解更多关于使用Apache Commons Lang API处理String的信息。

9. Conclusion


In this quick article, we found out various ways to extract a substring from a String in Java. You can explore our other tutorials on String manipulation in Java.


As always, code snippets can be found over on GitHub.