public final class LoadText extends Object
@charset
or no special encoding specification.
Unlike FileUtil.loadString(java.io.File) and URLUtil.loadString(java.net.URL),
this utility class implements the detection of the encoding.
Note that the detection of the encoding always succeeds because it uses a fallback value.
| Modifier and Type | Class and Description |
|---|---|
static class |
LoadText.EmacsStyleDetector
Detects an encoding by parsing
-*- coding: ENCODING -*-.
|
static class |
LoadText.Encoding
Encoding returned by
guessEncoding(byte[], int, int). |
static interface |
LoadText.EncodingDetector
Detects an encoding by parsing an ASCII encoding specification
(example:
@charset "UTF-8";). |
static class |
LoadText.EncodingDetectorBase
A base class which checks for validity the encoding returned by
LoadText.EncodingDetectorBase.doDetectEncoding(java.lang.String). |
static class |
LoadText.HTMLCharsetDetector
Detects an encoding by parsing
<meta charset="ENCODING" > or
<meta http-equiv="Content-Type"
content="text/html; charset=ENCODING">.
|
static class |
LoadText.KeywordBasedDetector
Detects an encoding by parsing
KEYWORD "ENCODING";,
for example @charset "ENCODING";.
|
static class |
LoadText.XMLEncodingDetector
Detects an encoding by parsing
<?xml encoding="ENCODING"?>.
|
| Modifier and Type | Field and Description |
|---|---|
static LoadText.EncodingDetector[] |
ALL_ENCODING_DETECTORS
A ready-to-use array containing all
LoadText.EncodingDetectors. |
static byte[] |
BOM_UTF16_BE
The
UTF-16BE BOM (Byte Order Mark). |
static byte[] |
BOM_UTF16_LE
The
UTF-16LE BOM (Byte Order Mark). |
static byte[] |
BOM_UTF8
The
UTF-8 BOM (Byte Order Mark). |
static LoadText.KeywordBasedDetector |
CSS_CHARSET_DETECTOR
A ready-to-use instance of
KeywordBasedDetector("@charset") (CSS stylesheets). |
static LoadText.EmacsStyleDetector |
EMACS_STYLE_DETECTOR
A ready-to-use instance of
LoadText.EmacsStyleDetector. |
static LoadText.HTMLCharsetDetector |
HTML_CHARSET_DETECTOR
A ready-to-use instance of
LoadText.HTMLCharsetDetector. |
static LoadText.XMLEncodingDetector |
XML_ENCODING_DETECTOR
A ready-to-use instance of
LoadText.XMLEncodingDetector. |
| Modifier and Type | Method and Description |
|---|---|
static String |
checkEncoding(String encoding)
Returns the canonical name of encoding if valid;
null otherwise. |
static Reader |
createReader(InputStream in,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
Creates a reader which can be used to read the contents
of specified text source.
|
static String |
detectEncoding(byte[] bytes,
int byteCount,
int[] bomLength,
LoadText.EncodingDetector... detectors)
Detect encoding by examining specified bytes which
have been read at the very start of a text file.
|
static LoadText.Encoding |
guessEncoding(byte[] bytes,
int offset,
int length)
Guess the encoding of a text file by examining its first few bytes.
|
static String |
loadChars(Reader in)
Load the characters contained in specified source.
|
static String |
loadText(File file,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
|
static String |
loadText(InputStream in,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
Loads the contents of specified text source.
|
static String |
loadText(URL url,
boolean followRedirects,
int timeout,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
Loads the contents of specified text file.
|
static String |
loadText(URL url,
String fallbackEncoding,
String[] encoding,
LoadText.EncodingDetector... detectors)
|
public static final LoadText.XMLEncodingDetector XML_ENCODING_DETECTOR
LoadText.XMLEncodingDetector.public static final LoadText.KeywordBasedDetector CSS_CHARSET_DETECTOR
KeywordBasedDetector("@charset") (CSS stylesheets).public static final LoadText.HTMLCharsetDetector HTML_CHARSET_DETECTOR
LoadText.HTMLCharsetDetector.public static final LoadText.EmacsStyleDetector EMACS_STYLE_DETECTOR
LoadText.EmacsStyleDetector.public static final LoadText.EncodingDetector[] ALL_ENCODING_DETECTORS
LoadText.EncodingDetectors.public static final byte[] BOM_UTF16_BE
UTF-16BE BOM (Byte Order Mark).public static final byte[] BOM_UTF16_LE
UTF-16LE BOM (Byte Order Mark).public static final byte[] BOM_UTF8
UTF-8 BOM (Byte Order Mark).public static String loadText(File file, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
IOExceptionpublic static String loadText(URL url, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
IOExceptionpublic static String loadText(URL url, boolean followRedirects, int timeout, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
url - the location of the text file.followRedirects - if true, follow redirections,
("301: Moved Permanently", "302: Temporary Redirect") including
very common http to https ones.
No effect unless url is an
http/https URL.timeout - specifies both connect and read timeout values
in milliseconds. 0 means: infinite timeout.
A negative value means: default value.fallbackEncoding - the fallback encoding.
May be null in which case a sensible value
(generally SystemUtil.defaultEncoding) is automatically
determined.encoding - the encoding actually used to load the text
is copied there. May be null.detectors - unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";.IOException - if there is an I/O problempublic static String loadText(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
This method implements the detection of the encoding.
Note that the detection of the encoding always works because it uses a fallback value.
in - the text source.fallbackEncoding - the fallback encoding.
May be null in which case a sensible value
(generally SystemUtil.defaultEncoding) is automatically
determined.encoding - the encoding actually used to load the text
is copied there. May be null.detectors - unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";.IOException - if there is an I/O problempublic static String loadChars(Reader in) throws IOException
in - the character sourceIOException - if there is an I/O problempublic static Reader createReader(InputStream in, String fallbackEncoding, String[] encoding, LoadText.EncodingDetector... detectors) throws IOException
in - the text source.fallbackEncoding - the fallback encoding.
May be null in which case a sensible value
(generally SystemUtil.defaultEncoding) is automatically
determined.encoding - the encoding actually used to load the text
is copied there. May be null.detectors - unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";.IOException - if there is an I/O problempublic static final String checkEncoding(String encoding)
null otherwise.public static String detectEncoding(byte[] bytes, int byteCount, int[] bomLength, LoadText.EncodingDetector... detectors)
bytes - bytes read at the beginning of a text file.byteCount - number of bytes read at the beginning of a text file.bomLength - the length of the BOM is stored as the first element
of this array. This allows to skip the BOM. May be null.detectors - unless a BOM is found, specified encoding detectors are
used to parse the first lines of the file, possibly containing
an encoding specification like @charset "UTF-8";.null otherwisepublic static LoadText.Encoding guessEncoding(byte[] bytes, int offset, int length)
bytes - byte buffer.offset - byte buffer offset.length - byte buffer length. At least 4 for this function to work.Encoding.UNKNOWN otherwise