They are all palatal sibilants in Japanese, while in English they’re palato-alveolar sibilants. Very hard difference for English speakers to hear, but the distinction is common enough to exist in many languages. And the “ch”/“j”/“sh”/“zh” sounds I speak of are just common variations of “t”/“d”/“s”/“z” that occur before “i” (they are spelled si -> shi, zi -> zhi/ji, ti -> chi, di -> ji).
Usually “zhi” isn’t spelled out in Rōmaji though, actually it’s often spelled “ji” even when they’re sometimes pronounced differently (so “zi” and “di” end up being spelled the same, perhaps confusingly, but most people pronounce them the same so it doesn’t really matter). But I think pronouncing them differently is more of an archaic, obsolete, ot dialectal thing anyways.
The “h” in “hi” also sounds different.
The spelling also changes in the same way before a syllable that starts with a “y” sound, e.g. syu -> shu or dyo -> jo.
Before “u” some consonants also change (hu -> fu, tu -> tsu, du -> dzu).
These sound changes don’t occur for all speakers/dialects, some don’t have a “shi” and just say “si” for example, but they are the most common and standard I believe.