Why does slicing in Rust fail on anything that's not English?

The thing is, slicing in Rust happens by byte index, not char index! So when you do: let full = String::from("hello world"); let part = &full[0..5]; // slice from index 0 to 5 (not inclusive) println!("{}", part); // prints "hello" Every char here takes a 1 byte space when the string is in English. 'A' => 01000001 (1 byte) 'z' => 01111010 (1 byte) So when you slice from the 0th to the 4th index on the bytes, you get "hello"! Unlike this, let's say when you do something like using Unicode characters, Hindi characters or emojis. Let's take an example: let s = String::from("नमस्ते"); let slice = &s[0..3]; // ⚠️ this will panic! Why do you think this panicked? Rust strings are UTF-8 encoded, meaning that other than English, characters or emojis may take more than 1 byte of space. Character UTF-8 Encoding Bytes A 0x41 1 ñ 0xC3 0xB1 2 न 0xE0 0xA4 0xA8 3

Apr 16, 2025 - 08:50
 0
Why does slicing in Rust fail on anything that's not English?

The thing is, slicing in Rust happens by byte index, not char index!

So when you do:

let full = String::from("hello world");
let part = &full[0..5]; // slice from index 0 to 5 (not inclusive)
println!("{}", part);   // prints "hello"

Every char here takes a 1 byte space when the string is in English.

'A'  => 01000001  (1 byte)
'z'  => 01111010  (1 byte)

So when you slice from the 0th to the 4th index on the bytes, you get "hello"!

Unlike this, let's say when you do something like using Unicode characters, Hindi characters or emojis. Let's take an example:

let s = String::from("नमस्ते");
let slice = &s[0..3]; // ⚠️ this will panic!

Why do you think this panicked?
Rust strings are UTF-8 encoded, meaning that other than English, characters or emojis may take more than 1 byte of space.

Character UTF-8 Encoding Bytes
A 0x41 1
ñ 0xC3 0xB1 2
0xE0 0xA4 0xA8 3

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.