socio4health

Overview

Package socio4health is an extraction, transformation and loading (ETL) classification tool designed to simplify the intricate process of collecting and merging data from multiple sources, focusing on sociodemographic and census datasets from Colombia, Brazil, and Peru, into a harmonized dataset.

Seamlessly retrieve data from online data sources through web scraping, as well as from local files.
Support for various data formats, including .csv, .xlsx, .xls, .txt, .sav, fixed-width files and geospatial files, ensuring versatility in sourcing information.
Consolidating extracted data into a pandas (or dask) DataFrame.

Dependencies

	Dask Dask is a flexible parallel computing library for analytics.
	Pandas Pandas is a well-known open source data analysis and manipulation tool.
	Geopandas Python tools for geographic data.
	Numpy The fundamental package for scientific computing with Python.
	Scrapy Framework for extracting the data you need from websites.
	Matplotlib Library for creating static, animated, and interactive visualizations in Python.
	Torch Python package for tensor computation and deep neural networks.

Installation

socio4health can be installed via pip from PyPI.

# Install using pip
pip install socio4health

How to Use it

To use the socio4health package, follow these steps:

Import the package in your Python script:

from socio4health import Extractor()
from socio4health import Harmonizer

Create an instance of the Extractor class:
```
extractor = Extractor()
```

Extract data from online sources and create a list of data information:

url = 'https://www.example.com'
depth = 0
ext = 'csv'
list_datainfo = extractor.s4h_extract(url=url, depth=depth, ext=ext)
harmonizer = Harmonizer()

For more detailed examples and use cases, please refer to the socio4health documentation.

Resources

Package Website

The socio4health website package website includes API reference, user guide, and examples. The site mainly concerns the release version, but you can also find documentation for the latest development version.

Organisation Website

Harmonize is an international project that develops cost-effective and reproducible digital tools for stakeholders in Latin America and the Caribbean (LAC) affected by a changing climate. These stakeholders include cities, small islands, highlands, and the Amazon rainforest.

The project consists of resources and tools developed in conjunction with different teams from Brazil, Colombia, Dominican Republic, Peru, and Spain.

Organizations

Authors / Contact information

Here is the contact information of authors/contributors in case users have questions or feedback.

Diego Irreño (developer)
Erick Lozano (developer)
Juan Montenegro (developer)
Ingrid Mora (documentation)

Name		Name	Last commit message	Last commit date
Latest commit History 454 Commits
.github/workflows		.github/workflows
docs		docs
src/socio4health		src/socio4health
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE.md		LICENSE.md
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

socio4health

Overview

Dependencies

Installation

How to Use it

Resources

Organizations

Authors / Contact information

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

harmonize-tools/socio4health

Folders and files

Latest commit

History

Repository files navigation

socio4health

Overview

Dependencies

Installation

How to Use it

Resources

Organizations

Authors / Contact information

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages