How to get data from Telegram using Python

2 June 2023 8 minutes Author: Cyber Witcher

Telegram data mining with Python: Unleashing the power of the API

Python is a powerful and popular programming language known for its simplicity and efficiency. It is used to develop various programs, web applications, scientific research and many other projects. Python has a clean syntax that makes it easy to learn and understand, and provides a wide range of possibilities for developers. Python supports a large number of libraries and frameworks, which allows you to expand its functionality and usability. There are many specialized libraries for different fields such as web development, scientific computing, data processing, artificial intelligence and many others. Thanks to its versatility and ease of use, Python has become one of the most popular programming languages in the world. It is used as a starting language for beginners in programming, and is also a tool in the hands of experienced developers. Many large companies and organizations use Python for their projects because of its efficiency and flexibility.

Python is a powerful and popular programming language with many libraries that simplify interaction with external services. Using Python and the corresponding libraries for the Telegram API, you can make various requests to the Telegram servers and retrieve data such as messages, user profiles, contact lists, etc. This article will provide you with  information on how to connect to the Telegram API using Python, configure authentication, make requests to the server, and retrieve the required data. We’ll also share useful tips and code examples to help you get started with the Telegram API without too much effort. Accessing data from Telegram using Python is a great opportunity for developers, researchers and analysts who want to get more information about users and their activity in this popular messenger. Join us and learn how to manipulate data from Telegram using the power of Python.

Requirements

Be sure to grab the latest source code from the GitHub repository. You can also watch this video tutorial on how to use this script.

For research purposes and analysis of Telegram channel content, you may need channel data in plain JSON format.

I created a Python script to retrieve data from Telegram channels. It has two main files, one to retrieve member data from the channel and one to retrieve channel messages. This script stores this data in JSON files; you can use them for analysis or to import them into your databases.

Python 3 must be installed

Also, I used the telethon Python package to work with Telegram.

To install telethon, you need to use the pip command:

pip3 install telethon

You can read Telethon documentation , to learn about all the features of this package.

Get your Telegram API credentials

To connect to Telegram, we need api_id and api_hash. To get these options, you need to log in Telegram core and go to the area API development tools . There is a form you need to fill and after that you can get your api_id and api_hash.

Here’s Telegram’s help documentation on that, how to get your API Credentials.

Create a Telegram client in your Python script

This part is almost the same for both channel member engagement and channel messages. First, we need a basic import:

I used to have configparser read the API credentials from the config file and package.json dump the data into JSON files.

We import what we need from Telethon to create a Telegram client in our script.

As you probably know, storing Telegram API credentials in the source code is not safe. If you add any credentials directly to your source code, you risk your security, as well as the security of those who use that code, because you are misleading those who want to use your code.

So to avoid security issues, we put our API credentials in another file called config.ini. It has the following simple structure:

Now, to create a Telegram client in our Python script, we first read these credentials into our code:

Now that we have everything we need, we try to log into Telegram and create a client object to receive data:

Telegram authorizes your credentials and then asks for a verification code and password if you have set them for your Telegram. It’s exactly the same as if you were logging into your Telegram account in the app or online.

Be aware that when this script is run, it has access to your Telegram account. Make sure you run the script in a safe environment.

Now we have a client object ready and we can use this object to connect and communicate with Telegram.

Involvement of channel participants

We will do this in two steps. First, we get all channel member data from Telegram, and then store this data in a JSON file.

Before performing these steps, remember to add three more imported files to the script header:

Telegram channel members request

First of all, we ask the user for a Telegram channel. You can provide the script with a channel URL or a unique channel ID.

So, we get the user data and turn it into a Telegram channel:

If the user gives us a channel ID, we can turn it into a PeerChannel object. And if a user provides us with a Telegram channel URL (eg https://t.me/channel), we can use it directly.

The next step will be to attract channel members. First, you need to know that Telegram does not respond with all the data you request, but provides the data in batches. We can get 100 participants in each request.

We set the limit to 100, starting at offset 0, and create a list that will contain the members of the channel. In an infinite loop, we create a GetParticipantsRequest object that looks for empty strings in the channel’s list of participants, and this returns all users. As I mentioned, we can only get 100 members per request. After getting the members, we check if the participants object has a users property. If it doesn’t have users, that means we get all the users, so we break the infinite loop. If it has users, we add new members to the list of all members and add the length of the list of received members to the offset, so the next query asks for users starting at that offset. This cycle continues until it covers all channel members.

Store the data in a JSON file

This is the easiest part. Although you can store the data in any database such as MySQL, MongoDB, etc., the easiest way is to store the data in a JSON file. However, if you have a lot of data, it is better to consider storing it in a database.

You can store the entire member object in a JSON file, but I prefer to store only what I need. So, I created a list to add member data to and then wrote a JSON dump of that list to a file

Simple and easy: I create a member data dictionary and add it to the list. After that I wrote the JSON dump to a file.

Here is the full code to get Telegram channel members:

Receiving channel messages

Before starting this step, you need to add this imported data to the body of your script:

After editing the import, creating a Telegram client in your Python code is exactly the same as in the previous section. Also, getting the channel ID or URL from the user is the same as described in the previous section. So, I’m assuming you have your Telegram client ready and you’ve created a channel object, which I call my_channel:

Sending a GetHistoryRequest object to the Telegram client returns a history object with a list of messages. Again, we have a limit of 100 messages per request. So we loop this query in an infinite loop. After each request, we check if the history object has a messages property. If not, we’ve reached the end of messages in the channel, so we can exit the loop.

I also added a total_count_limit variable. You may not want to receive all messages, or it may take too long to receive all messages, so you can set how many messages you want to receive from a channel. If set to 0, the script will receive all messages from the channel.

This time setting the offset is a bit tricky. GetHistoryRequest gets an offset_id, which means from which message it should start getting history. Whenever you get a list of messages, you need to set the offset to the ID of the last message:

To save a message as JSON data, you need to convert the message object to a dictionary. You can use the to_dict function to get a message object in dictionary format:

In the last two lines of code, check that total_count_limit is set to a value greater than 0. If the total number of messages received is the total number of messages we want, and if these two conditions are met, the loop breaks.

Now that you have all the message data, you can save this list to a JSON file.

Other related articles
Found an error?
If you find an error, take a screenshot and send it to the bot.