Multiple Files Keyword search Using Python

In this tutorial we will be learning how to find specific text or words in multiple text files. This is one of the core topic of python i/o, so let's get started!

python is one of the high level languages and it is very popular in scripting. For this part we are going to use the follwing:

  • Use open() method to open different files.
  • Looping over text and search the word from the text file.
  • Find return the text files that contains our keyword.

Create The File Tree:


Let's add a directory that contains 3 Text files , We'll just name them file(n).txt , 1 through 3.

Then we will create our Python script, we will call it searchText.py .

We wil make sure that everything is in the same directory.

file-list

And i have created three text files which contain text like in the picture below:

Text1.txt

Hello World


My Name is Scraber

I will search for content all over tour files


Text2.txt

Hello World


My Name is Crawler

I will search fkeywords all over your files


Text3.txt

Hello World


My Name is Scraber

I will search for keywords

Printout all directory files:


Let's first import the library os Python library and Print out all the files in our current directory:



  import os

  #Print all files and folders in the searchText.py directory
  print(os.listdir())
        

The output should look similar to this showing 4 file names:


Output:

['searchText.py', 'Text1.txt', 'Text2.txt', 'Text3.txt']

As we can see, all files were grabbed, but we are only interested in .txt Files. All other file tpyes needs to be ignored.

Seperate file name from extension in Python:


We will use split() python function, which when passed "." it will split the text to two sections

  • what before the "." .
  • what after the "." .


  import os

  new_file_name = []
  for file in os.listdir():
      new_file_name.append(file.split('.'))

  print(new_file_name)
        

Output:

[['searchText', 'py'], ['Text1', 'txt'], ['Text2', 'txt'], ['Text3', 'txt']]


By that have seperated the file name from the file extension.

What we need now is to throw away any non .txt files.

Keeping only .txt files :


So we will create a new list called merge_file_name. containing only the .txt files.



  import os

  new_file_name = []
  for file in os.listdir():
      new_file_name.append(file.split('.'))


  marge_file_name = []
  for i in range(len(new_file_name)):
      if 'txt' in new_file_name[i]:
          marge_file_name.append('.'.join(new_file_name[i]))
      else:
          pass
        

Output:

['Text1.txt', 'Text2.txt', 'Text3.txt']

Taking user input Keyword to search for in files:


now we are going to take input from the user, so we are going to use the input() method, then we are going to loop through the marge_file_name list and read the file data using open() method like the following:



  user_word = input("Enter your word:")
  for name in marge_file_name:

      with open(name,mode='r') as file:
          data = file.readlines()
        

Search for keyword in every .txt file:


now we are going to perform loop on the data variable (all the data we stored for each file) and we are going to check if the user input data is matching with the text inside every file. If the data is found, we will print the line number of the found data along with a message.

Note that the search is case sensetive.



  user_word = input("Enter your word:")
  for name in marge_file_name:

      with open(name,mode='r') as file:
          data = file.readlines()

      # print(data)


      for i in range(len(data)):
          if user_word in data[i]:
              print(f'Yes founded in line: {i+1} in {name} file ')
          else:

              pass
 

Results:

Let's search for the key word Scraber.

The output would look like this:


Output:


Enter your word:Scraber

Yes founded in line: 4 in Text1.txt file

Yes founded in line: 4 in Text3.txt file


As we can see this word exist in Text1.txt and Text2.txt

Full Multiple Text Keyword search source code:


      # -*- coding: utf-8 -*-
    """
    Created on Thu Sep  8 11:17:40 2022

    @author: Hamsho
    """

    import os


    new_file_name = []
    for file in os.listdir():
        new_file_name.append(file.split('.'))


    marge_file_name = []
    for i in range(len(new_file_name)):
        if 'txt' in new_file_name[i]:
            marge_file_name.append('.'.join(new_file_name[i]))
        else:
            pass

    # print(marge_file_name)

    user_word = input("Enter your word:")
    for name in marge_file_name:

        with open(name,mode='r') as file:
            data = file.readlines()

        # print(data)


        for i in range(len(data)):
            if user_word in data[i]:
                print(f'Yes founded in line: {i+1} in {name} file ')
            else:
                pass