Multiple Files Keyword search Using Python
In this tutorial we will be learning how to find specific text or words in multiple text files. This is one of the core topic of python i/o, so let's get started!
python is one of the high level languages and it is very popular in scripting. For this part we are going to use the follwing:
- Use open() method to open different files.
- Looping over text and search the word from the text file.
- Find return the text files that contains our keyword.
Create The File Tree:
Let's add a directory that contains 3 Text files , We'll just name them file(n).txt , 1 through 3.
Then we will create our Python script, we will call it searchText.py .
We wil make sure that everything is in the same directory.
And i have created three text files which contain text like in the picture below:
Text1.txt
Hello World
My Name is Scraber
I will search for content all over tour files
Text2.txt
Hello World
My Name is Crawler
I will search fkeywords all over your files
Text3.txt
Hello World
My Name is Scraber
I will search for keywords
Printout all directory files:
Let's first import the library os Python library and Print out all the files in our current directory:
import os
#Print all files and folders in the searchText.py directory
print(os.listdir())
The output should look similar to this showing 4 file names:
Output:
['searchText.py', 'Text1.txt', 'Text2.txt', 'Text3.txt']
As we can see, all files were grabbed, but we are only interested in .txt Files. All other file tpyes needs to be ignored.
Seperate file name from extension in Python:
We will use split() python function, which when passed "." it will split the text to two sections
- what before the "." .
- what after the "." .
import os
new_file_name = []
for file in os.listdir():
new_file_name.append(file.split('.'))
print(new_file_name)
Output:
[['searchText', 'py'], ['Text1', 'txt'], ['Text2', 'txt'], ['Text3', 'txt']]
By that have seperated the file name from the file extension.
What we need now is to throw away any non .txt files.
Keeping only .txt files :
So we will create a new list called merge_file_name. containing only the .txt files.
import os
new_file_name = []
for file in os.listdir():
new_file_name.append(file.split('.'))
marge_file_name = []
for i in range(len(new_file_name)):
if 'txt' in new_file_name[i]:
marge_file_name.append('.'.join(new_file_name[i]))
else:
pass
Output:
['Text1.txt', 'Text2.txt', 'Text3.txt']
Taking user input Keyword to search for in files:
now we are going to take input from the user, so we are going to use the input() method, then we are going to loop through the marge_file_name list and read the file data using open() method like the following:
user_word = input("Enter your word:")
for name in marge_file_name:
with open(name,mode='r') as file:
data = file.readlines()
Search for keyword in every .txt file:
now we are going to perform loop on the data variable (all the data we stored for each file) and we are going to check if the user input data is matching with the text inside every file. If the data is found, we will print the line number of the found data along with a message.
Note that the search is case sensetive.
user_word = input("Enter your word:")
for name in marge_file_name:
with open(name,mode='r') as file:
data = file.readlines()
# print(data)
for i in range(len(data)):
if user_word in data[i]:
print(f'Yes founded in line: {i+1} in {name} file ')
else:
pass
Results:
Let's search for the key word Scraber.
The output would look like this:
Output:
Enter your word:Scraber
Yes founded in line: 4 in Text1.txt file
Yes founded in line: 4 in Text3.txt file
As we can see this word exist in Text1.txt and Text2.txt
Full Multiple Text Keyword search source code:
# -*- coding: utf-8 -*-
"""
Created on Thu Sep 8 11:17:40 2022
@author: Hamsho
"""
import os
new_file_name = []
for file in os.listdir():
new_file_name.append(file.split('.'))
marge_file_name = []
for i in range(len(new_file_name)):
if 'txt' in new_file_name[i]:
marge_file_name.append('.'.join(new_file_name[i]))
else:
pass
# print(marge_file_name)
user_word = input("Enter your word:")
for name in marge_file_name:
with open(name,mode='r') as file:
data = file.readlines()
# print(data)
for i in range(len(data)):
if user_word in data[i]:
print(f'Yes founded in line: {i+1} in {name} file ')
else:
pass