You are comparing 2 kinds of tests: unit tests and integration tests. Unit tests should be fast. Myself, I like unit tests to run a fraction of a second. This matters when you run a test repeatedly while writing/changing code. To make unit test fast, your test can't call any remote resource or access a database. Most definitely, you don't want to start a chat server in your unit test.
Unit tests should also help you with refactoring. It's best done when your unit test only goes to the nearest architectural boundary. You test that you reach the boundary and hand correct message over the boundary. You may also check that your code correctly handles whatever may be returned from across the boundary. This way your system is loosely coupled and changeable - and so are your tests.
In addition to unit tests, you also test that your system interacts correctly across the boundaries and with other systems. People call these tests interaction or integration tests. They may take longer to run because they typically perform remote access and may even have to launch a test instance of a server. Because you already decoupled your layers in unit tests, you don't need to test every possible permutation of inputs and outputs in your integration tests. Each integration test runs longer, but you have fewer of them.
Putting it all together, I suggest that you create "small" unit tests for your ChatService, ChatNotificationDelegate and so on, going only as far as the next service. You can mock the other service to verify that your test subject passes the right parameters across the boundary and reacts correctly to returns. You would aim to achieve a high coverage of unit tests. In addition, you will have a small number of integration tests that each work across one boundary. Or maybe even a single end to end test of sending the message, as you suggested.
So my answer to
test every bit we know exactly what went wrong, or to make my tests as
general as possible (just the endpoints) to allow more freedom of
refactoring the code
is both: many small unit tests to know exactly what went wrong and one big integration test to be sure all pieces fit together. And I'd like to mention that a single end to end test doesn't "allow freedom of refactoring". The reverse is often true. Such a black box test can tell you that something is wrong, but you will have no idea what exactly broke. If system can break mysteriously, without an easy way to identify or at least pinpoint the root cause, you are more likely to fear making significant changes, such as refactoring.