Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Screenshot capture feature #270

Open
PSNAppz opened this issue Jan 12, 2023 · 7 comments
Open

Screenshot capture feature #270

PSNAppz opened this issue Jan 12, 2023 · 7 comments

Comments

@PSNAppz
Copy link
Member

PSNAppz commented Jan 12, 2023

Is your feature request related to a problem? Please describe.
Screenshot capturing is a useful feature that can be added to an OSINT tool, as it allows the tool to take screenshots of the pages it crawls and save them to the database or file system. This can be useful for creating a visual record of the pages that have been crawled, which can be helpful for documenting the results of the crawling process. Additionally, it can be used for creating an archive of the crawled pages, which can be useful for analyzing changes over time.

Describe the solution you'd like
With this feature, the tool can take screenshots at different resolution, different viewport, and even capture the whole webpage using a library such as puppeteer, Selenium, etc.

Describe alternatives you've considered
N/A

Additional context
It can also be useful for creating a visual comparison of the pages before and after a specific event.

@KingAkeem
Copy link
Member

If anyone wants to take on this task before I do, here's some context.

You should make use of the LinkTree class which uses treelib to construct a tree data structure that can be printed, downloaded use tree operations such as searching the tree.
https://github.com/DedSecInside/TorBot/blob/dev/torbot/modules/linktree.py

Using the class requires passing the root node of the tree and how far you would like the tree to be built, depth-wise

tree = LinkTree(root = "https://www.example.com", depth = 2) # builds tree on instantiation 
tree.show() # prints tree to std output
tree.save("test.txt") # saves tree results to `test.txt`

The tree nodes currently only save the URL, but treelib has a mechanism to extend nodes to store data.
https://treelib.readthedocs.io/en/latest/index.html#advanced-usage

class WebMetadata(object):
  def __init__(self, html, headers): 
            self.html = html
            self.headers = headers

# using treelib library
tree = Tree()
resp = requests.get("https://www.example.com")
tree.create_node("root", "root", data=WebMetadata(resp.text, resp.headers)) # passing html and headers

@pavankalyan767
Copy link

is this issue closed or open ?

@PSNAppz
Copy link
Member Author

PSNAppz commented Oct 15, 2023

@pavankalyan224847 This is open and not assigned to anyone.

@KingAkeem
Copy link
Member

This comment #270 (comment) is out of date.

The LinkTree does still exist but has been refactored completely, you can check out the refactored code here.

https://github.com/DedSecInside/TorBot/blob/dev/torbot/modules/linktree.py

Let me know if you have any questions.

@pavankalyan767
Copy link

can you assign this to me i would like to work on it

@KingAkeem
Copy link
Member

@pavankalyan224847 Done!

@KingAkeem
Copy link
Member

Updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

3 participants